I wonder who their target market are. ML requires a solid math background and the ability to customize every detail of the process which drag-and-drop tools never fully provide. I understand that it's always beneficial to have more user-friendly tools, but I still think ML experts - at least those who don't just copy-paste code snippets from SO - would still prefer the more professional R and Python packages.
Maybe Microsoft aims at teaching ML to beginners, which still would be detrimental if they get used to just that.
I had this exact same thought when I read the headline. It seems like MS and others are viewing ML as a similar opportunity to Big Data/BI ten years ago. You saw the "democratization of data" as people with little technical skills could suddenly create analytics dashboards within tools like Tableau.
In my opinion, it's far too easy to make a critical mistake during design/implementation of ML to follow this same path. And what's more, if you mess up making an analytics dashboard, it's usually fairly obvious. In ML, there are MANY ways to mess up a model and you have no easy way to tell.
If someone doesn't have the technical experience behind creating these models, I would not trust any output they give me from using one of these tools. And if they do have the experience, they would certainly not be choosing to use one of these tools either.
Can you please elaborate more on what kind of critical mistakes a machine can make, while someone with math background would not make.
I am building a competing tool, so I am not affiliate with MS, but I do think that auto ML has value.
Machine learning is different from imperative programming in such that most of the "programming" is done by experiments and not with actual "program", hence there is an opportunity to replace programming with compute. I.e. an automl platform can create 100's of models/pipelines and just try them all.
Also, why would you trust a model which was created manually and not a model which was auto created.
When a model is created in auto ML it pass the same validation process as manually created model, so in both cases the quality of the model should be judged independent from the way that it was created.
In addition, all models (regardless of how they were created - human / not human), should be monitored for predictive performance. I.e. I will not "trust" any model without continuous verification.
A common error is target leaking. An AutoML system will likely consider this a "strong feature". This is where having someone that actually understands the business domain is critical.
There's no question that there's value in AutoML system yet most ML production systems I've worked on / seen were way more complex than feature vector -> model -> prediction.
You likely have multiple models, pipelines, normalizations and plain old conditionals. Hard to automate all of this.
Right. I am aiming at the group of companies that have 0 data scientist and would like to avoid hiring one. I assume that their use cases is simple/common and can be automated.
Note that automation is not only building the model, but automating the full life cycle - pre processing, hp optimization , pipeline deployment and monitoring/retraining.
> "Can you please elaborate more on what kind of critical mistakes a machine can make, while someone with math background would not make.
I am building a competing tool"
the short answer is, go study stats and fundamentals of ML instead of asking hn to build your product for you.
> "why would you trust a model which was created manually and not a model which was auto created."
one of many reasons: domain knowledge is important, and math alone cant tell you things are muffed up. contrived example: you build a linear regression model to predict home price and square footage has a negative coefficient. Math conclusion: bigger house = lower price. domain knowledge: oh, we are missing a feature and the model cant tell the difference between city homes vs rural.
there is value to auto ml but there is a lot of room to go horribly wrong
Again, my point is that for a given data set, an auto ml system is much more efficient and radically cheaper than human modeler.
You are pointing to an area outside the realm of automl (feature engineering/generation) , which is domain specific. But this was not my original question.
this has nothing to do with feature engineering and generation. I never added or changed any features in the example. It is exactly in the realm of automl, you run a model, -because- you are missing data, your model is making wrong assumptions.
You could argue (which you didn't) that this would fall under model interpretation, but a model in this example would probably fail to generalize and make bad predictions in the future: IE slamming home values because they have large square footage.
>In ML, there are MANY ways to mess up a model and you have no easy way to tell.
What about all those businesspeople who only hire analysts to tell them (and their peers) what they want to hear? Now they can tell themselves what they want to hear, having laundered it through a computer.
A lot of ML problems are already solved fairly well and people want to use them in their products. For example, say you wanted to make a smartphone app that had some kind of image recognition. You aren't trying to invent a new machine learning algorithm. This tool would be very convenient for making an app like that.
You won't trust one of these people to work on your project, will you?
The problem gets worse in unsupervised ML, e.g. cluster analysis. Whatever variables you choose, clustering will give you some results. But only an experienced person can understand what variables to choose for the clustering, how to do it, and what those clusters really mean. You can't just try different things in clustering until it "works", because it always works.
Maybe Microsoft aims at teaching ML to beginners, which still would be detrimental if they get used to just that.