I believe there was an article/paper in the last few months about that exact iss...

I believe there was an article/paper in the last few months about that exact issue

Someone was saying that with an increasing number of attempts, or increasing context length, LLMs are less and less likely to solve a problem

(I searched for it but can't find it)

That matches my experience -- the corrections in long context can just as easily be anti-corrections, e.g. turning something that works into something that doesn't work

---

Actually it might have been this one, but there are probably multiple sources saying the same thing, because it's true:

Context Rot: How Increasing Input Tokens Impacts LLM Performance - https://news.ycombinator.com/item?id=44564248

In this report, we evaluate 18 LLMs, including the state-of-the-art GPT-4.1, Claude 4, Gemini 2.5, and Qwen3 models. Our results reveal that models do not use their context uniformly; instead, their performance grows increasingly unreliable as input length grows.

---

As far this question: how do we manage to create things like the Linux kernel or the Mars landers without AI

It's because human intelligence is a totally different thing than LLMs (contrary to what interested people will tell you)

Carmack said there are at least 5 or 6 big breakthroughs left before "AGI", and I think even that is a misleading framing. It's certainly possible that "AGI" will not be reached - there could be hardware bottlenecks, software/algorithmic questions, or other obstacles we haven't thought of

That is, I would not expect AI to create anything like the Linux kernel. The burden of proof is on the people who claim that, not the other way around !!!