I don't know exactly how OpenAI does it, but I think it's probably something like this: the OpenAI server that generates the responses reads and executes the ‘code that LLM wants to execute’ that exists in the LLM output in some way, and then passes the execution results as internal conversation history.
As a concrete method, for example, the code to be executed could be enclosed in tags such as <run-code>, or it may be achieved using metadata at a lower layer.
However, although GPT-4o should have a code execution function, it is possible that the code is not actually being executed and that the LLM is simply generating output that looks like (but is not actually) correct.
I have always used o1 pro and deep research, but these are only available through the web UI.
there is no doubt that cursor and others have a better UI, but the demand for this type of tool exists because OpenAI does not release an API
This is self-promotional, but https://github.com/nahco314/feed-llm has TUI to choose what to give to llm. There are many similar tools out there, but I think this approach is relatively effective for larger code bases.
I think this means that the cost of cutting-edge models such as o3 pro has become too high, and that ordinary users can no longer use them freely.
I fear that, even though it is clear that smarter models are optimal, OpenAI will use cheaper models for its own convenience.
In my experience, I think it’s primarily a mental health issue. My passion for computers has always been rooted in a fundamental curiosity, rather than any deep reason or meaning. Simply writing code and watching software run used to excite me—it was fun, pure and simple.
There was a time, though, when it all felt utterly meaningless and made me depressed. I found myself thinking things like, “Even if I build this, it won’t make any money,” or “This job may pay, but it doesn’t solve any real, fundamental problems.” Intellectually, I knew I used to do it just because I enjoyed it, yet I couldn’t stop those thoughts.
However, as my depression went into remission, I started to feel that simple joy again (though I’m not fully recovered yet).
I suspect you’re going through something similar. While it’s important to think, I believe that rather than overthinking, focusing on your mental health first is the better approach—though of course, I understand how hard that can be.
I've been going through that a year or so back and I believe the trigger was mostly doing meaningless (technically) work on top of shifting ground (because it was either rushed and no one actually care about quality). And there was the syndrome of shiny stuff (because consumerism) and FOMO. I took a step back, bought some cheap office PC (a mini desktop and a laptop), installed Linux first (alpine), then FreeBSD. Then it's been an exploration of what I can do with computers, not just whatever some company allowed me to do.
What I concocted was more fragile, but I either understood it completely or it would be easy to actually learn how it works. This feeling of control worked wonder for me.
They are very good at searching for niche results that are not often found by simply looking them up on Google. It is often depressing to find a better alternative after building software, but if you ask deep research first, they will find all the competition.
I created independent binary for the sake of using the same thing in bash/fish/powershell and for the ease of adding functions, but for simple use, that way would be better.
However, although GPT-4o should have a code execution function, it is possible that the code is not actually being executed and that the LLM is simply generating output that looks like (but is not actually) correct.