Honestly, skip all of the courses. Pick a problem to solve, start googling for common models that are used to solve the problem, then go on github, find code that solves that problem or a similar one. Download the code and start working with it, change it, experiment. All of the theory and such is mostly worthless, its too much to learn from scratch and you will probably use very little of it. There is so much ml code on github to learn from, its really the best way. When you encounter a concept you need to understand, google the concept and learn the background info. This will give you a highly applied and intuitive understanding of solving ml problems, but you will have large gaps. Which is fine, unless you are going in for job interviews.
Also bear in mind that courses like fast.ai (as you see plastered on here), aggresively market themselves by answering questions all over the internet. Its a form of SEO.
EDIT (Adding this here to explain my point better):
My opinion is that the theory starts to make sense after you know how to use the models and have seen different models produce different results.
Very few people can read about bias variance trade off and in the course of using a model, understand how to take that concept and directly apply it to the problem they are solving. In retrospect, they can look back and understand the outcomes. Also, most theory is useless in the application of ML, and only useful in the active research of new machine learning methods and paradigms. Courses make the mistake of mixing in that useless information.
The same thing is true of the million different optimizers for neural networks. Why different ones work better in different cases is something you would learn when trying to squeeze out performance on a neural network. Who here is intelligent enough to read a bunch about SGD and optimization theory (Adam etc), understand the implications, and then use different optimizers in different situations? No one.
I'm much better off having a mediocre NN, googling, "How to improve my VGG image model accuracy", and then finding out that I should tweak learning rates. Then I google learning rate, read a bit, try it on my model. Rinse and repeat.
Also, I will throw in my consiracy theory that most ML researchers and such push the theory/deep stats requirement as a form of gatekeeping. Modern deep learning results are extremely thin when it comes to theoretical backing.
Watch maybe one or two short videos on back propagation. You don't need to be muddled in the theory and the math - you can become productive right away.
Once you start playing with pytorch and tensorflow models (train them yourself or do transfer learning), you'll start to develop an intuition for how the network graphs fit together. You'll also pick up tools like tensorboard.
Also, do transfer learning. It's so awesome to train on a publicly-available high quality and large data set, train for a lot of epochs for good problem domain fit, then swap out your own smaller data set. It's magical.
I have a feeling that ML in the future will be like engineering today. You can learn by doing and don't need a degree or formal background to be productive and eventually design your own networks.
I have no formal training (save one undergrad course that was way outdated in "general AI"), and I've designed my own TTS and voice conversion networks. I have real time models that run on the CPU for both of these, and as far as I know they're more performant than anything else out there (on CPU).
Eventually you might start reading papers. (You'll be productive long before you need to do this.) Most ML papers are open access, but review (broad survey) articles might need pirating. Thankfully there are websites that can help you get these. The papers aren't hard to read if you've spent some time playing with the networks they pertain to. Read the summary, abstract, and figures before diving into the paper. It may take a few reads and some googling.
You do not need to be a data scientist. Anybody can do it. That said, a good GPU will help a lot. I'm using two 1080Ti in SLI and they're pretty decent.
I feel somewhat similarly. If you want to learn ML from the “ground up” that means learning math (at least a few subjects) to the senior undergraduate level, some numerical methods, some probability and statistics, sprinklings of other stuff before you even get to the models. And it’s not even clear that stuff is important for ML in practice.
I’m someone who took all those math courses and some grad ML coursework. And what that means is that I’m qualified to try and hack together some specific research level things that a practitioner will be confused by, and then try to write a paper about it. It doesn’t mean I’m qualified to do what the practitioner does. Frankly I never ran my code on anything other than MNIST yet and don’t know the different architectures or applications well, since they’re not directly what I work on. They’re just different things, as I see it.
> I have no formal training (...) I have real time models that run on the CPU (..) and as far as I know they're more performant than anything else out there
> You do not need to be a data scientist. Anybody can do it. That said, a good GPU will help a lot. I'm using two 1080Ti in SLI and they're pretty decent
An alternative is that, by not knowing what you are doing, you may not see all the options that exist -- and when you hit a problem too hard, you just throw more hardware (GPUs) at it.
This is not to say it is not sometimes a valid approach, but I'd be wary of someone who say hasn't had any formal training in C, and says that his stuff is more performant that anything out there- just because lack of training causes not knowing stuff that already exists.
> An alternative is that, by not knowing what you are doing, you may not see all the options that exist -- and when you hit a problem too hard, you just throw more hardware (GPUs) at it.
Maybe some will. I just explained that I'm running my models on CPUs, so I'm actually developing sparse and efficient resource constrained models that evaluate quickly.
I've been working with libtorch's JIT engine in Rust (tch.rs bindings).
I'm currently trying to adapt Melgan to the Voice Conversion problem domain so I can get real time, high-fidelity VC without using a classical vocoder. WORLD works great and quickly, but it's a poor substitute for the real thing as it only maps the fundamental frequency, spectral envelope, and aperiodicity. Melgan is super high quality and faaast.
Are you working on VC (input: speech of one speaker, output: the same spoken content, but sounds like another speaker) or speaker-adaptive speech synthesis (input: text, output: speech)?
Also check out ParallelWaveGAN, another high-quality and very fast (on CPU) neural vocoder.
> You do not need to be a data scientist. Anybody can do it. That said, a good GPU will help a lot. I'm using two 1080Ti in SLI and they're pretty decent.
>Also, I will throw in my consiracy theory that most ML researchers and such push the theory/deep stats requirement as a form of gatekeeping.
Learning the fundamentals of a field is supposed to be gatekeeping. It's what stops you from making stupid mistakes. The field of ML is littered with horrible errors made by people who don't know the fundamentals.
Your analogy is wrong i.e. you are comparing apples to oranges. ML is very different from other "normal" computation systems.
* Non-ML: Input + {Rules} = Output
* ML: Input + Output = {Rules}
where "{Rules}" = Infinite set of possible "Programs" each of which is a trace through a very large state space of variables.
In the first case, we humans use all our ingenuity to write the program and tweak it to get the right results. We already know the difficulties involved in writing "correct" programs but have mastered it to some extent.
In the second case, you cannot do that. Your "Programs" are derived by the system and encoded in numbers. How in the world do you even know that your encodings are correct? This is why you need the techniques of Mathematics to transform (eg. Linear Algebra) and constrain (eg. Inferential Statistics/Probability) the output "Rules" so you can have some measure of confidence in it. This is the fundamental challenge inherent in ML.
> How in the world do you even know that your encodings are correct?
Easy, you know that they aren't and will ever be entirely correct for complex enough ML problems, just like humans. The ways to handle its errors is not an ML topic though, you just have to ensure via old fashioned system design that the system you build doesn't depend on any ML model to always output correct results.
You can say that about any field, discipline or skill.
At the same time, there is a difference whether one starts learning that, and one wants to apply it in a large, production system with social implications (be it advertising, medicine, or anything). Hobby projects, or even small startups, rarely fall in that region.
Moreover, even a profound knowledge of mathematics does not give any edge in ethics, or even - awareness of problems with real data (noise, bias, malicious use, social reception, etc).
I hope you're trolling because this is a guaranteed way to climb a peak of stupidity [1]. If OP is determined to get a bit deeper than 30 min guides on Medium, there is sure theory to learn. But it is merely second year of college, and probably you would like to skip Kolmogorov axiomatics and measure theory, it won't hurt your understanding of bleeding edge researches.
I disagree with you. Both ways work. Starting from theory, or starting from practice.
However, in a business setting, starting from practice is much more effective. As a lead dev and a manager who's had over 20 years of experience in AI/ML I've trained several engineers in building ML systems.
I always start with a business problem and point them to resources (frameworks, blogs, jupyter notebooks) to help them along. The problem is small enough for them to solve in less than a quarter. I avoid micromanaging them and will only answer larger questions by providing more resources. If they really get stuck I'll sit with them and walk through the issue. I have yet to have an engineer be unable to 1) get a model working and 2) tune it to production quality.
My opinion is that the theory starts to make sense after you know how to use the models and have seen different models produce different results.
Very few people can read about bias variance trade off and in the course of using a model, understand how to take that concept and directly apply it to the problem they are solving. In retrospect, they can look back and understand the outcomes. Also, most theory is useless in the application of ML, and only usefull in the active research of new machine learning methods and paradigms. Courses make the mistake of mixing in that useless information.
The same thing is true of the million different optimizers for neural networks. Why different ones work better in different cases is something you would learn when trying to squeeze out performance on a neural network. Who here is intelligent enough to read a bunch about SGD and optimization theory (Adam etc), understand the implications, and then use different optimizers in different situations? No one.
I'm much better off having a mediocre NN, googling, "How to improve my VGG image model accuracy", and then finding out that I should tweak learning rates. Then I google learning rate, read a bit, try it on my model. Rinse and repeat.
What usually happens is that people get something working, think they now know ML, but don't even generally know enough to understand the things they did wrong, and never end up getting to the theory.
The best approach is to learn both concurrently. Learn some theory, apply it and understand that applications including pitfalls, then learn a bit more and repeat. Incremental learning with a solid base. It's fun to hate on academia but this is how experts with deep knowledge of a domain get to where they are.
Sadly, you are 100% correct. I see the same problems over and over in newly published AI research papers.
That said, playing for 1-2 weeks might be a good start towards getting motivated for learning the difficult and dry theory needed to excel in this field.
I personally started with Kaggle competitions and lots of googling (duckduckgoing right?), but quite quickly hit the wall of not understanding, I felt like a mindless creature who makes a decision based on couple of guides out there. Watching lectures from Andrew Ng, reading some books helped a lot, but I can't see a reason why one doesn't wanna start with theory. It's no gold and glitter, and no one promised you that, unless you're really want to delegate your work to AutoML
I guess his point is to tackle it from a top-down approach. For me, that's how I am breaking ground in my ML study. I tried Andrew Ng's course, I didn't understand a thing.
Then I tried Kaggle's mini-course. It kickstarted me into ML and motivated me to learn the theory as I go. For example, when I got to apply Random Forest Regressor, I went to Wikipedia and tried to read on it. Got some idea. And the progress is good.
Maybe for some of us, I think top-down is motivating and makes the learning process enjoyable.
Same here. I tried Andrew Ng's course a few times ever since it launched a few years back but I could only get through half of it. Fast ai makes more sense to me and I've picked up a decent amount of concepts where I can now go back and feel confident enough to tackle theory.
The danger is throwing something into production without understanding bias and variance, overfitting (or other important concept) with potentially disastrous results.
One cannot do ML without some basic theoretical knowledge of Statistics and Probability. This gives you the What and the Why behind everything. GI-GO is more true of ML than other disciplines. The techniques used are so opaque that if you don't know what you are doing, you can never trust the results.
One thing that made the Uber fatality possible was their over-confidence in their AI, which they apparently did not fully understand. They considered it unnecessary and disabled the car-integrated emergency collision breake system ...
“Scientists start out doing work that's perfect, in the sense that they're just trying to reproduce work someone else has already done for them. Eventually, they get to the point where they can do original work. Whereas hackers, from the start, are doing original work; it's just very bad. So hackers start original, and get good, and scientists start good, and get original.” - Paul Graham in Hackers and Painters
BTW: While information theory is everywhere, I have to yet see where measure theory makes a practical impact on practical deep learning. The importance of pure math for practical machine learning is highly overrated (and I speak as someone who did study that).
No. You will not get beyond copy-paste level without being comfortable with ML foundations. That doesn't mean you need to be able to prove variational inference bounds in your sleep, but you'll want to know why we need things like lower bounds for approximate inference.
Sure, go through the fastai material and maybe write a blog post about how you learned ML (read: DL) in a few months. What you really learned is copy-pasting code (as you mentioned) and some neural net tricks (like a good learning-rate to start SGD).
How to learn ML? Do fastai + reading Daphne Koller's and Chris Bishop's books on PGMs + re-implementing a paper on Gaussian process classification + another paper on GNNs + ....
bishop's book is a good suggestion (i prefer hastie) for ml but you have to admit that
1. fastai is neural nets
2. bishop's book (and whoever else's) are grad books that require considerable mathematical training to really profit from
3. the aforementioned books don't teach anything practical!
so ultimately i completely agree with the op of this thread - just jump in and read around when things don't work how you expect.
Just go for it. Learning the math just helps you understand it’s not magic, like learning to program helps you understand computers aren’t magic.
As someone that learned a good bit of the math and implemented NN code with backprop from scratch, I agree with the parent. To learn the math and get better results than cutting edge ML researchers would be as likely as winning the lottery.
As an exercise, the math is fun to learn and not terribly complicated for backprop type of stuff.
For what it's worth, this is basically the learning model fast.ai works on. You start by just applying pre-built models to things, then learn how to tweak them, then learn the theory that makes the tweaks work.
kudos to OP! AI & ML are also on my list for 2020!!
> All of the theory and such is mostly worthless, its too much to learn from scratch and you will probably use very little of it.
i, too, believe in code before theory. but not for stats, artificial intelligence, or machine learning, numerical computing, etc. why?
because, for instance, if you compare a popular & successful machine learning framework to a "build your own deep neural network in 150 lines of python", the difference as far as data structures or programming constructs choices will be staggering.
especially if you are an experienced programmer. or just someone who cares about the data structures and programming constructs in the first place. but these choices are not accidental!
you will find that "parameters" are represented by a "class", ie. objects with associated operations and not values. why? because you want to do things like accumulate contributions to derivatives, and all these other calculus things i thought i was never going to ever use.
ML engineer here. I didn’t take any ML classes in college and picked up most of what I know on the job.
I think this advice is directionally correct - reading through a theory-dense textbook like Bishop, which many consider to be a foundational ML textbook, is likely to be a bad use of your time. However, I think it does help to start with some theory, if only to give you the vocabulary with which to think about and get help with issues that you run into. At the risk of sounding like a broken record, Andrew Ng’s class on Coursera (https://www.coursera.org/learn/machine-learning) is quite good - it’s accessible with a bit of basic calculus knowledge (simple single variable derivatives and partial derivatives are all you need) and basic linear algebra (like, matrix multiplication). The whole class took me around 30 hours to get through, so if you’re determined, you could probably finish it in 2-3 weeks even if you’re pretty busy.
Also, if you like having text notes to refer to, I made these notes for myself a few years back when taking the class: https://github.com/tlv/ml_ng. There are some spots where, for my own understanding (I’m a bit of a stickler for mathematical rigor), I added more of the reasoning/equation pushing that Ng glosses over in his lectures. I would say that for a practical understanding of how to apply the concepts covered in the class, there’s no need to read those parts carefully (there’s a reason why Ng glossed over them).
But yeah, to all the people saying you should start by reading entire textbooks on multivariable calculus, statistics, and linear algebra...that’s not necessary. Most ML engineers I’ve met (and even most industry researchers, although my sample size there is much smaller) don’t understand all of those things that deeply.
Also, one last semi-related note - if you’re reading a paper and get intimidated by some really complex math, oftentimes that math is just included to make the paper look more impressive, and sometimes it’s not even correct.
Without experience in ML it's often hard to know what problems are solvable, how to frame the problem, and to tell a good solution in Github from a bad one, etc.
If you want to go an applied route I'd suggest starting somewhere like Kaggle and looking through the competitions for ones vaguely similar to yours. They've done all the hard work of choosing a challenging but solvable problem, sourcing and splitting the data, and choosing a metric. You then can see what techniques actually work really well, and benchmark different approaches. Academic challenges like Imagenet or Coco are also good for this, but you'll have to work harder to find relevant resources.
Once you've done this a couple of times, you can start framing your own problems, collecting and annotating your own datasets, deploying and maintaining models.
One thing I’ve personally seen is software engineers with an interest in deep learning use it to solve very simple problems that just need a linear statistical model. That’s a risk you take, and one reason “gatekeeping” happens.
If you want to raise your salary from $10 to $20 per hour, playing with existing models is the way to go.
If you want to make serious money solving real problems, take the time to learn about automated differentiation and all the related mathematics about how gradients flow backwards through the network.
But like the coding slave (great nick BTW) said, first play a bit, then learn how it works. Image transformation GANs are a lot of fun.
Here's why the learning part will be crucial to differentiate you from all the clueless outsourced cheap labor:
Recently, there has been a load of new AI papers by so-called scientists on optical flow, and even the greatest new approaches using millions of parameters and costing hundreds of thousands of dollars to train still DO NOT reach the general level of quality that the 2004 census transform approach had.
Similarly, there have been high-profile papers where people randomly chained together TensorFlow operations to build their loss function, oblivious to the fact that some intermediate operations were not differentiable and, hence, their loss would never back-propagate. As a result, all of their claims had to be fraudulent because one could mathematically prove that their network was incapable of learning.
The larger AI competitions have by now limited the number of submissions that teams are allowed to make per week, simply to discourage people from trying to guess the test results when their AI doesn't work as it should.
Or consider the Uber pedestrian fatality where their neural network was overtrained ( = bad loss function ) to the point where it was unwilling to recognize bicycles at night.
And lastly, not knowing about gradient descent will just waste boatloads of money by 100x-ing your training time. Most stereo disparity and depth estimation AI papers use loss functions that only work on adjacent pixels. That means for a single correction to propagate to all pixels in a HD frame, you'll need 1920 iterations when only 1 could be sufficient.
You will find that my examples are all from autonomous driving. That's because here the discrepancy between GPU-powered brute force amateurs and skilled professionals is the most striking. German luxury cars have integrated lane-keeping, street sign recognition, and safety distance keeping for 10+ years, so for those tasks there are proven algorithms that work on a Pentium III in real time. And now there's lots of NVIDIA GPU kiddies trying to reinvent the wheel with limited success.
For your future employer, you having a firm grasp of how gradients work is the difference between mediocre and state of the art results, and between affordable and too expensive. So if there is one single AI skill that is both exhausting to learn and crucially important, it is differentiation and gradient flow.
As others have pointed out, terrible wrong-headed advice to ignore all "Theory". ML cannot be studied by a scatter-shot approach but needs a systematic plan with Theory first followed by Practice and constant iteration between the two.
When I've learnt something it often is helpful to get well known problems so you get to compare to how other solve it too. Kaggle was good for big data stuff like that. I'm not sure about ML.
I definitely agree that you don't need to go deep into theory to be able to do useful things. But I think the bias-variance tradeoff is a very bad example of "useless theory". It's essentially just another name for overfitting/underfitting, which are approximately the most important ML concepts there are.
I would again argue, the natural progression for this concept would be:
1.) Trains classifier
2.) My train error was so low! Why is my validation error so high
2.) Googles -> Why is my classifier training error lower than my validation error
3.) Learns about overfitting
4.) learns about bias variance
Its always a natural progression. Reading about this stuff without encountering it means it usually doesnt stick, and really doesnt make that much sense.
If you already have concepts of training and validation error then you're already there. The risk is not realising you can't test on your training data, or more subtly that you can't tune hyperparameters on your test data.
True, but I guess it depends on the person. Was just trying to give HN a view of how I write code. I've found it to be faster, but I go in knowing I will be doing a ton of googling.
This is one of the very few (!) concepts you need to know to get practical with ML. Why not watch a few videos on the concepts before you begin? They are all using high-school math anyway.
Also bear in mind that courses like fast.ai (as you see plastered on here), aggresively market themselves by answering questions all over the internet. Its a form of SEO.
EDIT (Adding this here to explain my point better):
My opinion is that the theory starts to make sense after you know how to use the models and have seen different models produce different results.
Very few people can read about bias variance trade off and in the course of using a model, understand how to take that concept and directly apply it to the problem they are solving. In retrospect, they can look back and understand the outcomes. Also, most theory is useless in the application of ML, and only useful in the active research of new machine learning methods and paradigms. Courses make the mistake of mixing in that useless information.
The same thing is true of the million different optimizers for neural networks. Why different ones work better in different cases is something you would learn when trying to squeeze out performance on a neural network. Who here is intelligent enough to read a bunch about SGD and optimization theory (Adam etc), understand the implications, and then use different optimizers in different situations? No one.
I'm much better off having a mediocre NN, googling, "How to improve my VGG image model accuracy", and then finding out that I should tweak learning rates. Then I google learning rate, read a bit, try it on my model. Rinse and repeat.
Also, I will throw in my consiracy theory that most ML researchers and such push the theory/deep stats requirement as a form of gatekeeping. Modern deep learning results are extremely thin when it comes to theoretical backing.