Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'd agree that this effect is probably mainly due to architectural parameters such as the number and dimensions of heads, and hidden dimension. But not so much the model size (number of parameters) or less training.

Saw something about Sonnet 4.6 having had a greatly increased amount of RL training over 4.5.



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: