I personally agree that this experiment is evidence that there are certain problems that cannot be solved simply by making the models bigger, and one of the main research questions I'm interested in is what we need to do to elicit more reasoning-like capabilities from them.
There are people who fall more on the side of bitter-lesson/scaling-law-maximalism, and I think it's probably healthy and valuable that there are people in the research community placing both types of bet.
> elicit more reasoning-like capabilities from them.
But will a branch-and-search with some-noisy-evaluator get us there? Obviously this is easier with some domains.
Or maybe simply using increasing large "prompts" or "input specification" to specify the desired end result. There might be a while scaling law hiding there ..
There are people who fall more on the side of bitter-lesson/scaling-law-maximalism, and I think it's probably healthy and valuable that there are people in the research community placing both types of bet.