That's a good read. Some of those problems are known from quantitative finance, where trying to extract statistical signals from data has been going on for decades. After much effort, all the easy wins (easy to find correlations) have been found and are no longer easy wins, because too many players have found them.
Some of the other problems listed are new, coming from taking what used to be research techniques and putting them into production programs. Those are more like ordinary big system problems, such as configuration management. The article points out, though, that your huge training set is now part of your configuration.
Then there's the problem of systems assigning excessive significance to features which happen to work but are really irrelevant. Those image recognition demos on HN last week illustrated this problem. At least there, though, there's a way to visualize what's happening. For many ML systems, humans have no clue what the system is actually recognizing. If your ML algorithm has locked onto the wrong features, it can become drastically wrong due to a minor change in the data. I saw an example of this in a web spam filter that was doing fairly well, working on 4-character sequences taken from web pages. It was actually recognizing specific code text used by one ad service. The page content seen by humans was totally irrelevant.
"We made a great prediction that didn't pan out because others also acted on the same prediction" or "the future just didn't turn out like the past in that particular data-mined aspect and we don't know why"? I know which story I'd want to tell were it my unprofitable system.
This paper's conclusion (I only read the abstract) jives with my experience. Whenever I've tried to make something "intelligent" it always ends up in headaches. Now-a-days when building code, I try to stay away from having the computer make decisions. I've found its much easier to build stuff in a way that put a real human in charge of making all decisions.
Unfortunately, thats not scalable for a whole bunch of problems. I agree its a headache, but the solution is not to stay away from machine intelligence.
What I meant was for projects that need to scale, i.e. you can't have humans making decisions, there is no way to avoid this. If your project does not need to scale in this manner, my statement doesn't apply - in fact, I quite agree with you - I'm not sure why one would use machine learning!
It is fine (good even!) to design things in a way to let humans make the intelligence-required decisions, as that becomes a nice modular interface for intelligent optimization systems that can be introduced later. It also provides a nice heuristic: If your human interface doesn't expose the data used to make a computationally optimal intelligent decision, then it isn't a good human interface. And it works quite well with Agile principals. If you iterate on that interface until the humans start to like the interface, then that is a good signal that it is time to start introducing intelligent optimization. Some of the most groundbreaking intelligent systems at Amazon have followed the same principles.
I think a lot of the problem is that ML practitioners often don't know much about software engineering. The other half is that people frequently manage programs as if they were projects.
A project in management theory is considered to have a plan covering budget, scope and time / deadline [1].
Such approach makes sense in engineering disciplines [†] which benefit from centuries of experience -- say, bridgebuilding -- the current state of affairs in software engineering is that predictability of software development is sketchy at best. One cannot reliably estimate all three of budget, scope and time of a software project, especially when there are significant unknown unknowns [2] involved. There are well known examples of software projects ending up breaking two or even three of planned constraints of budget, scope and time.
So far it seems various iterative methodologies (usually cnnected to agile methodologies [3]) suit software development better. Those differ significantly from single-project approach -- development effort consists of successive small iterations, each greatly predictable on its own, and overall progress being tracked as velocity.
As unanticipadet problems are encountered, they are included into plan in natural way, with corresponding adjustments to budget, scope and time.
> sometimes a claim is made software engineering is not real engineering due to lacking in predictability
There are some projects that could be considered real engineering, usually safety critical applications running on at a nuclear power plant or on a spacecraft where everything is formally verified.
But the vast majority of software engineering isn't ran like a real engineering project
Side note: I love the analogy of a credit card to describe technical debt and its one I've used with clients before and they really respond to it.
People (in the UK at least), understand that the interest on a credit card can kill you, while a regular loan assumes that you're paying off the principal month to month. Posing the question, are you paying off that technical debt month to month or is it just sitting there really gets people thinking.
The point of a metaphor is to describe something so that the other person will be able to understand the original point. Choosing a metaphor is not a case of finding something as similar as possible; you have to find something that the other person will already understand. Consequently, the credit card metaphor is far better despite being less accurate.
Sure (as I tried to point out) but that doesn't mean that (a) every manager in the company understands it or (b) people in other types of business/industry understand it. Claiming that the concept exists is pretty useless if the appropriate language isn't used to describe it. For example, telling my Dad that he really needs to abate his ingestion of sodium chloride is not going to be as effective as telling him to cut down on salt.
Let's say you are an airline and I am an oil company. You want to have a predictable cost for your fuel so you can plan, I want to charge whatever is the market price at the time. So a bank comes along and agrees, for a modest fee, that in a years time, you can buy fuel at a fixed price. In a year, they buy fuel from me and give it to you. If my price is lower, they pocket the difference as profit, but if it's higher, then they have to eat the cost and therefore make a loss (which is unlikely to be entirely covered by the fee you already paid them, but that's fine, because what you were really paying for was peace of mind).
People have been doing this with agricultural products for at least 3000 years.
Agriculture and airlines tend to use futures to hedge, not options.
A (long) futures contract is the right AND obligation to purchase a commodity at a set price at a set time (and place).
A (call) options contract is the right BUT NOT the obligation to purchase a commodity at a set price within a set time (for the most typical American-style option).
To me, this is a big part of what makes machine learning exciting: it's so challenging to implement it well. The result of it is that machine learning touches a lot of computer science, from high-level languages and formal verification to low-level languages and systems concerns (GPU programming, operating systems).
This difficulty is also a reason why machine learning programmers who are, at least, validated tend to get a lot of trust from the business that CommodityScrumDrones don't get (and that's why most good programmers want to redefine themselves as "data scientists"; it's the promise of autonomy and interesting work). No one tells a machine learning engineer to "go in to the backlog and complete 7 Scrum tickets by the end of the sprint". Of course, the downside of all this is that true machine learning positions (which are R&D heavy) are rare, and there are a lot more so-called "data scientists" who spend most of their time benchmarking off-the-shelf products without the freedom to get insight into how they work.
I actually think that the latter approach is more fragile, even if it seems to be the low-risk option (and that's why mediocre tech managers like it). When your development process is glue-heavy, the bulk of your people will never have or take the time to understand what's going on, and even though operational interruptions in the software will be rarer, getting the wrong answer (because of misinterpretation of the systems) will be more common. Of course, sometimes using the off-the-shelf solution is the absolute right answer, especially for non-core work (e.g. full-text search for an app that doesn't need to innovate in search, but just needs the search function to work) but if your environment only allows programmers to play the glue game, you're going to have a gradual loss of talent, insight into the problem and how the systems work, and interest in the outcomes. Reducing employee autonomy is, in truth, the worst kind of technical debt because it drains not only the software but the people who'll have to work with it.
At any rate, I'd say that while this seems to be a problem associated with machine learning, it's just an issue surrounding complex functionality in general. Machine learning, quite often, is something we do to avoid an unmaintainable hand-written program. A "black box" image classifier, even though we can only reason about it empirically (i.e. throw inputs at it and see what comes out) is going to be, at the least, more trustworthy than a hand-written program that has evolved over a decade and had hundreds of special cases, coming from aged business requirements that no longer apply and programmers from a wide spectrum of ability, written in to it. All in all, I'd say that ML reduces total technical debt; it's just that it allows us to reach higher levels of complexity in functionality, and to get to places where even small amounts of technical debt can cause major pain.
Some of the other problems listed are new, coming from taking what used to be research techniques and putting them into production programs. Those are more like ordinary big system problems, such as configuration management. The article points out, though, that your huge training set is now part of your configuration.
Then there's the problem of systems assigning excessive significance to features which happen to work but are really irrelevant. Those image recognition demos on HN last week illustrated this problem. At least there, though, there's a way to visualize what's happening. For many ML systems, humans have no clue what the system is actually recognizing. If your ML algorithm has locked onto the wrong features, it can become drastically wrong due to a minor change in the data. I saw an example of this in a web spam filter that was doing fairly well, working on 4-character sequences taken from web pages. It was actually recognizing specific code text used by one ad service. The page content seen by humans was totally irrelevant.