The intensely negative reaction to GPT-5 is a bit weird to me. It tops the chart...

strongpigeon · 2025-10-03T17:33:23 1759512803

I think a lot of it is a reaction to the hype before the launch of GPT-5. People were sold and were expecting a noticeable big step (akin to GPT 3.5-4), but in reality it's not that much noticeably better for the majority of use cases.

Don't get me wrong, I actually quite like GPT-5, but this is how I understand the backlash it has received.

tibbar · 2025-10-03T17:37:14 1759513034

Yeah that is fair. I admit to being a bit bummed out as well. One might almost say that if O3 was effectively GPT5 in terms of performance improvement, that we were all really hoping for a GPT6, and that's not here yet. I am pretty optimistic, based on the information I have, that we will see GPT6-class models which are correspondingly impressive. Not sure about GPT-7 though.

senordevnyc · 2025-10-03T19:30:20 1759519820

Honestly, I’m skeptical of that narrative. I think AI skeptics were always going to be shrill about how it was overhyped and thus this proves how right they were! Seriously, how good would GPT5 have had to be in order for Ed to NOT write this exact post?

I’m very happy with GPT5, especially as a heavy API user. It’s very cost effective for its capabilities. I’m sure GPT6 will be even better, and I’m sure Ed and all the other people who hate AI will call it a nothing burger too. So it goes.

HarHarVeryFunny · 2025-10-03T18:00:06 1759514406

> based on both frontier model performance in high-level math and CS competitions

IMO the only takeaway from those successes is that RL for reasoning works when you have a clear reward signal. Whether this RL-based approach to reasoning can be made to work in more general cases remains to be seen.

There is also a big disconnect between how these models do so well in benchmark tasks like these that they've been specifically trained for, and how easily they still fail in everyday tasks. Yesterday I had the just released Sonnet 4.5 fail to properly do a units conversion from radians to arcsec as part of a simple problem - it was off by a factor of 3. Not exactly a PhD level math performance!

tibbar · 2025-10-03T18:10:11 1759515011

I mean, I agree. There is not yet a clear path/story as to how a model can provide a consistently expert-performance on real-world tasks, and the various breakthroughs we hear about don't address that. I think the industry consensus is more just that we haven't correctly measured/targeted those abilities yet, and there is now a big push to do so. We'll see if that works out.

vessenes · 2025-10-03T17:49:25 1759513765

I agree. I mean, I can get o3 right from the API if I choose, but 5-Thinking is better than o3, and 5-Research is definitely better than o3 pro in both ergonomics and output quality. If you read reddit about 4o, the group that formed a parasocial relationship with 4o and relied on its sycophancy seems to be the main group complaining. Interesting from a product market fit perspective, but not worrying as to "Is 5 on the whole significantly better than 4 / o1 / o3?" It is. Well, 5-mini is a dumpster fire, and awful. But I do not use it. I'm sure it's super cheap to run.

Another way to think of oAI the business situation is: are customers using more inference minutes than a year ago? I definitely am. Most definitely. For multiple reasons: agent round trip interactions, multimodal parsing, parallel codex runs..