I'd agree that this effect is probably mainly due to architectural parameters su...

		versteegen 68 days ago \| parent \| context \| favorite \| on: Claude Sonnet 4.6 I'd agree that this effect is probably mainly due to architectural parameters such as the number and dimensions of heads, and hidden dimension. But not so much the model size (number of parameters) or less training. Saw something about Sonnet 4.6 having had a greatly increased amount of RL training over 4.5.