There's about ~10% point improvement left (i.e, from 80% to 90%) before it starts to stagnate. We've seen the same with predictive models benchmarked on ImageNet et. al.
It's funny to me we look at GPT4 scoring high on all these tests and think it's worth anything when educators and a lot of us here have been lamenting the standardized tests since Bush made it a preeminent feature of our country's education system. They are not a good measure of intelligence. They measure how well you can take a test.
Funny -- I literally had someone tell me this same thing this morning... but the exact same guy last week was arguing with me against the reduced importance of these same tests for college admissions. Last week he was arguing how critical these tests were for the college admissions process, but this morning the same tests are basically worthless.
Not saying you hold the same opinions -- but I wouldn't be surprised if people's take on these tests is more about what is convenient for their psyche than any actual principled position.
In principle I agree. On one hand, we can positively conclude that IQ is indeed important, but at the same time are horrible at measuring it. That being said, there is a country mile difference between most of these tests suitability for the purposes they are being used.
We mean beating humankind at the task, swiftly followed by humankind declaring that task wasn't a sign of proper intelligence anyway, and moving it's goalposts to a different field.
There's no way there's only 10% left to improve in those models. New versions are coming out regularly that are clearly improved. Midjourney v5 and GPT-4 were just released showing huge improvements, for example.
Not only that, but the innovation around this tech is also just getting started. It's immediately applicable for business use. The classical techniques still have their uses, of course.
It's not that there's only 10% left to improve. It's that the data needed, compute requirements, and model size are as intensive, getting from 0 to 80 as they are getting from 80 to ~85 or ~90. See https://paperswithcode.com/sota/image-classification-on-imag...