> “It’s not scientific progress unless you understand where the improvement is coming from.”
I don’t agree with this. If you can chronicle improvement, that is progress. Giving a satisfying linguistic description of that improvement, when possible, might be more progress, but merely documenting it is extremely important scientific progress in its own right.
Overall this essay was extremely hard to read and should cut down about 75% of the content. The whole wolpertinger thing is nothing but a distraction. Just say AI is a mixture of disciplines and serves a mixture of outcomes. It only takes away from the essay to act like you’re being literary or nuanced with the wolpertinger thing when all it does is subtract from the arguments.
And to boot, after so many words, the final advice is extremely hollow... literally just saying,
> “And so we should try to do better along lots of axes.”
How should we improve? I guess by “doing better” on “multiple axes.”
The section on “antidotes” is hardly better, saying:
> “I will suggest two antidotes. The first is the design practice of maintaining continuous contact with the concrete, nebulous real-world problem. Retreating into abstract problem-solving is tidier but usually doesn’t work well.“
Except this is already what basically everyone tries to do. Research labs try to maintain direct contact with state of the art benchmark tasks on a wide variety of data sets. And often they work extremely hard to produce results robust across several tasks and several data sets.
And in various other fractured or specific cases, the researchers are very clear up-front they are solving one particular, ultraspecific problem in the scope of the paper.
(Unfortunately the second antidote is more “wolpertinger”... ugh.)
> And to boot, after so many words, the final advice is extremely hollow... literally just saying,
> > “And so we should try to do better along lots of axes.”
> How should we improve? I guess by “doing better” on “multiple axes.”
That's not what the final advice is, the author is suggesting the use of "meta-rationality":
> "AI is a wolpertinger: not a coherent, unified technical discipline, but a peculiar hybrid of fields with diverse ways of seeing, diverse criteria for progress, and diverse rational and non-rational methods. Characteristically, meta-rationality evaluates, selects, combines, modifies, discovers, creates, and monitors multiple frameworks."
Although not expanded on in this essay, it seems like the whole blog is dedicated to the topic.
> "That's not what the final advice is, the author is suggesting the use of "meta-rationality""
I think you mis-read that section of the essay, because the whole conclusion of the meta-rationality section was the quote that I already gave in my comment, “And so we should try to do better along lots of axes.”
Literally, that is the sum-up of advice in the lone section of the essay that possibly has any call to action or advice. It gives a fairly quick and superficial overview of meta-rationality (which is OK), but does not say anything at all about putting it into practice except for "doing better" on "multiple axes" (literally, this is all it says).
So when you say the "final advice" is meta-rationality -- that's already what I was talking about. That's exactly the part where the essay fails to give any type of actionable payoff at all.
“The whole wolpertinger thing” is a metaphor and a literary device. This isn’t a technical manual.
I don’t know why you found this hard to read. The writing is clear and understandable. The dismissiveness of your comment and the fact that its out-of-handedness is based on nothing objective suggests that you’re not the target audience, especially since, in contrast to your comment, the thoughts in this article are researched, decently sophisticated, and exposed in a discursive manner.
I found the article to be meandering, unclear, messy, and not based on an active appraisal of the way progress in AI work already is judged. I don’t know why you claim my comment has “dismissiveness” — it does not. Pointing out that the failures of the writing or arguments makes it hard to read and lacking in any useful conclusion or call to action is not dismissive at all. On the contrary, I gave up a lot of time to interact with the essay by reading it and reflecting on it. It’s just not a good essay.
not based on an active appraisal of the way progress in AI work already is judged.
I don't think that's accurate at all. From the article:
"Adjacent to engineering is the development of new technical methods. This is what most AI people most enjoy. It’s particularly satisfying when you can show that your new system architecture does Z% better than the competition."
The ImageNet competition results from 2012 was the major turning point that exploded AI research, specifically in that computer vision was able to beat human level classification. Similarly for Chess previously and more recently Go.
Goodfellow's work with GANs and Pearl's work in Baysian Causality are the only major exceptions I see right now that are not based on the competitive improvement around a baseline. No other major scientific field approaches it this way.
I disagree very strongly. Many fields over long periods of the history of science have oriented themselves around benchmark problems.
Some things which come to mind are:
- C. elegans for connectomics
- Drosophila experiments for a wide range of biology benchmarks
- even previously in computer vision there was the so-called "chair challenge" [0], and dozens and dozens of canonical face detection, object detection, and segmentation data sets used frequently as benchmarks across many papers
- in Bayesian statistics there are various canonical data sets for evaluating theoretical improvements in hierarchical models and general regression
- in finance there is CRSP and the Kenneth French Data Library
It's very common across many fields to orient around benchmark problems and data sets, and it has been for a really long time. This is not at all new with ImageNet, not even just in the tiny world of computer vision.
That makes no sense. Attempting reproducibility and discovering that it cannot be obtained is also science. It's like you are trying to say science can only be defined by positive results, not negative results.
Science can be defined by positive or negative results... it's just that the single thing science "is about" (whether a positive instance or negative instance) is reproducibility. If you can reliably reproduce a behavior that you cannot explain, that's hugely scientific. If you can show reproducibility wasn't achieved in some certain conditions, that's also hugely scientific. Reproducibility is the thing.
That is science, yes. But if you are not attaining reproducible results, it is hard to argue you are making progress.
I agree that not all experiments should be in the positive or negative category. Ideally a solid mix. However, the spin of this and other stories is that pretty much no studies are seeing healthy reproduction. And nobody can really say why. Outside of p hacking and such. Which is fairly universally agreed as not progress.
I agree totally with what you say here, but circling back to the original quote from the essay, the author tries to say that any type of result which you cannot explain is not scientific progress, and that is what I disagreed with originally. Often we can get reproducible results about repeatable behavior or phenomena that "just work" without a solid, low-level, reductionist explanation about why, and in those cases, it's totally OK and still counts as valid progress. Similarly, when we try to do reproducibility studies and we cannot replicate a result, that is progress in the sense of ruling out a result, or shifting the burden of evidence back on the original researcher and casting proper doubt on something. Not usually as exciting or effectful as positive results, but progress nonetheless.
I totally agree we live in a world where publication incentives create perverse anti-science problems, with file drawer bias, p-hacking, falsifying data, etc.
I'm just saying that in the essay, the author seems to go waaay too far in claiming that it can only be scientific if you can tack on some type of "explanation" (which, we could even debate what that means and how you could know if you have the 'right' or 'complete' explanation).
I think you're raising a fair point. I was won over by the distinction of scientific versus engineering progress. The idea being that scientific progress had to have added something to our scientific understanding. An example I used in a sibling post was how you don't necessarily learn more about ballistics if the only way you could hit targets was a more powerful gun.
That said, I have to grant that is probably too reductionist. I think I like the idea, as it is just trying to be specific with types of progress, but I don't know if that is an accepted standard, or just one being proposed.
I don’t agree with this. If you can chronicle improvement, that is progress. Giving a satisfying linguistic description of that improvement, when possible, might be more progress, but merely documenting it is extremely important scientific progress in its own right.
Overall this essay was extremely hard to read and should cut down about 75% of the content. The whole wolpertinger thing is nothing but a distraction. Just say AI is a mixture of disciplines and serves a mixture of outcomes. It only takes away from the essay to act like you’re being literary or nuanced with the wolpertinger thing when all it does is subtract from the arguments.
And to boot, after so many words, the final advice is extremely hollow... literally just saying,
> “And so we should try to do better along lots of axes.”
How should we improve? I guess by “doing better” on “multiple axes.”
The section on “antidotes” is hardly better, saying:
> “I will suggest two antidotes. The first is the design practice of maintaining continuous contact with the concrete, nebulous real-world problem. Retreating into abstract problem-solving is tidier but usually doesn’t work well.“
Except this is already what basically everyone tries to do. Research labs try to maintain direct contact with state of the art benchmark tasks on a wide variety of data sets. And often they work extremely hard to produce results robust across several tasks and several data sets.
And in various other fractured or specific cases, the researchers are very clear up-front they are solving one particular, ultraspecific problem in the scope of the paper.
(Unfortunately the second antidote is more “wolpertinger”... ugh.)