Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That ARC AGI score is a little suspicious. That's a really tough for AI benchmark. Curious if there were improvements to the test harness because that's a wild jump in general problem solving ability for an incremental update.




They're clearly building better training datasets and doing extensive RL on these benchmarks over time. The out of distribution performance is still awful.

I don’t think their words mean just about anything, only the behavior of the models.

Still waiting of Full Self Driving myself.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: