Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm excited to see whether the instruction following improvements play out in the use of Codex.

The biggest issue I'e seen _by far_ with using GPT models for coding has been their inability to follow instructions... and also their tendency to duplicate-act on messages from up-thread instead of acting on what you just asked for.



I think thats part of the issue I have with it constantly.

Let's say I am solving a problem. I suggest strategy Alpha, a few prompts later I realize this is not going to work. So I suggest strategy Bravo, but for whatever reason it will hold on to ideas from A and the output is a mix of the two. Even if I say forget about Alpha we don't want anything to do that, there will be certain pieces which only makes sense with Alpha, in the Bravo solution. I usually just start with a new chat at that point and hope the model is not relying on previous chat context.

This is a hard problem to solve because its hard to communicate our internal compartmentalization to a remote model.


Unfortunately, if it's in context then it can stay tethered to the subject. Asking it not to pay attention to a subject, doesn't remove attention from it, and probably actually reinforces it.

If you use the API playground, you can edit out dead ends and other subjects you don't want addressed anymore in the conversation.


Claude models do not have this issue. I now use GPT models only for very short conversations. Claude has become my workhorse.


That's just how context works. If you're going to backpedal, go back in the conversation and edit your prompt or start a new session. I'll frequently ask for options, get them, then edit that prompt and just tell it to do whatever I decided on.


I've only had that happen when I use /compact, so I just avoid compacting altogether on Codex/Claude. No great loss and I'm extremely skeptical anyway that the compacted summary will actually distill the specific actionable details I want.


Huh really? It’s the exact opposite of my experience. I find gpt-5-high to be by far the most accurate of the models in following instructions over a longer period of time. Also much less prone to losing focus when context size increases

Are you using the -codex variants or the normal ones?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: