Last year o3 high did 88% on ARC-AGI 1 at more than $4,000/task. This model at i...

commandar · 2025-12-11T19:29:05 1765481345

Sure, but the reason I'm confused by the pricing is that the pricing doesn't exist in a vacuum.

Pro barely performs better than Thinking in OpenAI's published numbers, but comes at ~10x the price with an explicit disclaimer that it's slow on the order of minutes.

If the published performance numbers are accurate, it seems like it'd be incredibly difficult to justify the premium.

At least on the surface level, it looks like it exists mostly to juice benchmark claims.

rvnx · 2025-12-11T23:17:07 1765495027

It could be using the same early trick of Grok (at least in the earlier versions) that they boot 10 agents who work on the problem in parallel and then get a consensus on the answer. This would explain the price and the latency.

Essentially a newbie trick that works really well but not efficient, but still looking like it's amazing breakthrough.

(if someone knows the actual implementation I'm curious)

anticensor · 2025-12-14T19:45:21 1765741521

The magic number appears to be 12 in case of GPT 5.2 pro.