I've been really impressed with codex so far. I have been working on a flight simulator hobby project for the last 6 months and finally came to the conclusion that I need to switch from floating origin, which my physics engine assumes with the coordinate system it uses, to a true ECEF coordinate system (what underpins GPS). This involved a major rewrite of the coordinate system, the physics engine, even the graphics system and auxilary stuff like asset loading/unloading etc. that was dependent on local X,Y,Z. It even rewrote the PD autopilot to account for the changes in the coordinate system. I gave it about a paragraph of instructions with a couple of FYIs and... it just worked! No major graphical glitches except a single issue with some minor graphical jitter, which it fixed on the first try. In total took about 45 minutes but I was very impressed.
I was unconvinced it had actually, fully ripped out the floating origin logic, so I had it write up a summary and then used that as a high level guide to pick through the code and it had, as you said, followed the instructions to the letter. Hugely impressive. In march of 2023 OpenAI's products struggled to draw a floating wireframe cube.
I'd been imagining taking the Zig Language Server and adding some refactorings to it—it only had a bare minimum like Rename Symbol. It seemed like a huge project with so much context to get familiar with, so I put it off indefinitely. Then on a whim I decided to just ask GPT-5 (this was before Codex, even, I think?) to give it a go. Plopped it down in the repo and said, basically, implement "Extract Function". And it just kind of... did. The code wasn't beautiful, I could barely understand it, some of which must perhaps be blamed on the existing codebase not being exactly optimized for elegance, but it actually worked. On the first try! We continued to implement a few more refactorings. Eventually I realized the code we were churning out actually needs major revision and rewriting—but it took me from less than zero to "hey, this is actually provably possible and we have a working PoC" in, like, fifteen minutes. Which is pretty insanely valuable.
I think it kind of shines in this type of task. I am building my own game engine and it's very good for this type of refactoring. On some other tasks though, it clearly makes bad architectural decisions imo, like more junior developer might not get them but for instance in my game engine, it often tries to be too generalist like trying to build something akin to Unity that can do all sorts of games rather than focus on the type of game I am building it for unless I very explicitly always say it
It’d be probably useful to include this very comment in your system prompt or a separate file which you ask the coding agent to read at the beginning of each session.
I was unconvinced it had actually, fully ripped out the floating origin logic, so I had it write up a summary and then used that as a high level guide to pick through the code and it had, as you said, followed the instructions to the letter. Hugely impressive. In march of 2023 OpenAI's products struggled to draw a floating wireframe cube.