Coming from a background of bioinformatics, my experience of academia was quite different. Combining multiple sources of data and results from different prediction algorithms is quite common.
As for generic machine learning being a ridiculous idea, I don't see why he'd think this. Nearly all specialized systems use generic machine learning algorithms as a submodule. They can very much be commoditizable like EC2. Even google has an upcoming framework for this. Although I would agree that by themselves they are not sufficient.
edit: I also think you miss the point of academic papers. The goal is not to build a product, but rather to understand algorithms. Testing algorithms in isolation of other boosters is crucial for this. If you are testing a particular combining framework, only then does it make sense to include multiple approaches within the context if the proposed idea.
In bioinformatics, you additionally have the researchers who actually want an applied answer for their studies and work. Thus, in that area you do routinely get something more like a product being produced in an academic setting. The combined systems are often, but not always published.
The issue is that generic machine learning algorithms work ok enough as black boxes, but to squeeze top performance out of them you need to do feature engineering, architecture/model structure futzing, method selection, etc, and in practice there are far too many of these meta-hyperparameters to tune with cross-validation or something similar.
While the generic ML tools work really well, it takes domain knowledge to find the best way of applying them to the problem at hand, specially since it almost never fits into the classification/regression from IID training training data model that most algorithms are designed based on. At first this might seem counter-intuitive to you, but I've seen dramatic reductions in the error rate just from picking good features or a reasonable model structure in a way that's not easy to automate. And while deep learning or structure learning tries to address these problems, there are issues with nonconvexity and really long training times that make these algorithms unrealistic in many situations (and, consequently, make them underperform simpler methods with clever domain engineering).
Absolutely. In light of your comment, perhaps I am misunderstanding what the original article means by a generic learning algorithm?
The points you make are well understood in academics. There are probably hundreds of papers on feature selection and domain specific modelling in bioinformatics, for example.
In terms of boxed learning algorithms, I would assume that such a thing would provide for a way to supply models and inputs in a variety of formats. The latter allowing for users to do their own domain specific feature selection or other types of data reduction before applying a particular learning algorithm. In that sense, I could see things like Google's prediction API being useful in principle, even though it won't eliminate the large domain specific portion of the work.
So the point is there simply is no way to solve such a problem with solely the data contained in the data set.
I wonder then, could a system be developed for capturing the minimal required domain knowledge either in a data set itself, or in some other form. Especially as it evolves over time.
> I also think you miss the point of academic papers. The goal is not to build a product, but rather to understand algorithms.
I actually don't, being a researcher myself (not in ML, but in a field that uses a lot of it). I'm just saying that real-world datasets in the industry are nothing like the toy datasets that a lot of papers from universities are written with... there's a lot more noise, and you'd never be able to get a good classification (for example) using just one coherent set of techniques.
On the other hand, KDD/WWW/ICML and other data mining conferences are increasingly dominated by industry folks now, so my experience may not be as common anymore.
As for generic machine learning being a ridiculous idea, I don't see why he'd think this. Nearly all specialized systems use generic machine learning algorithms as a submodule. They can very much be commoditizable like EC2. Even google has an upcoming framework for this. Although I would agree that by themselves they are not sufficient.
edit: I also think you miss the point of academic papers. The goal is not to build a product, but rather to understand algorithms. Testing algorithms in isolation of other boosters is crucial for this. If you are testing a particular combining framework, only then does it make sense to include multiple approaches within the context if the proposed idea.
In bioinformatics, you additionally have the researchers who actually want an applied answer for their studies and work. Thus, in that area you do routinely get something more like a product being produced in an academic setting. The combined systems are often, but not always published.