That ARC AGI score is a little suspicious. That's a really tough for AI benchmar... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		causal 3 days ago \| parent \| context \| favorite \| on: GPT-5.2 That ARC AGI score is a little suspicious. That's a really tough for AI benchmark. Curious if there were improvements to the test harness because that's a wild jump in general problem solving ability for an incremental update.

woeirua 3 days ago | [–]

They're clearly building better training datasets and doing extensive RL on these benchmarks over time. The out of distribution performance is still awful.

taurath 3 days ago | [–]

I don’t think their words mean just about anything, only the behavior of the models.

Still waiting of Full Self Driving myself.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact