Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Here are some topics. Are they considered relevant to data science?

Matrix row rank and column rank are equal.

In matrix theory, the polar decomposition.

Each Hermitian matrix has an orthogonal basis of eigenvectors.

Weak law of large numbers.

Strong law of large numbers.

The Radon-Nikodym theorem and conditional expectation.

Sample mean and variance are sufficient statistics for independent, identically distributed samples from a univariate Gaussian distribution.

The Neyman-Pearson lemma.

The Cramer-Rao lower bound.

The margingale convergence theorem.

Convergence results of Markov chains.

Markov processes in continuous time.

The law of the iterated logarithm.

The Lindeberg-Feller version of the central limit theorem.

The normal equations of linear regression analysis.

Non-parametric statistical hypothesis tests.

Power spectral estimation of second order, stationary stochastic processes.

Resampling plans.

Unbiased estimation.

Minimum variance estimation.

Maximum likelihood estimation.

Uniform minimum variance unbiased estimation.

Wiener filtering.

Kalman filtering.

Autoregressive moving average (ARMA) processes.

Rank statistics are always sufficient.

Farkas lemma.

Minimum spanning trees on directed graphs.

The simplex algorithm of linear programming.

Column generation in linear programming (Gilmore-Gomory).

The simplex algorithm for min cost capacitated network flows.

conjugate gradients.

The Kuhn-Tucker conditions.

Constraint qualifications for the Kuhn-Tucker conditions.

Fourier series.

The Fourier transform.

Hilbert space.

Banach space.

Quasi-Newton iteration and updates, e.g., Broyden-Fletcher-Goldfarb-Shanno.

Orthogonal polynomials for numerically stable polynomial curve fitting.

Lagrange multipliers.

The Pontryagin maximum principle.

Quadratic programming.

Convex programming.

Multi-objective programming.

Integer linear programming.

Deterministic dynamic programming.

Stochastic dynamic programming.

The linear-quadratic-Gaussian case of dynamic programming.



I've been doing data science for a while now, and for me personally:

Not really. The SVD is much more important. No. Yes. Yes. No (R-N) yes (CE). Yes. Yes. Yes. Personally, no. Only in the usage of MCMC. Yes. Yes. No. Of course. All the time. Yes. Yes. The most I'll do is remember to use the sample standard deviation. No. Yes. No. Yes. Yes. Yes. No. No. Yes. I just use a solver. See above. See above. Of course. Yes. Yes. Not privileged w/r/t/ other bases. Of course. I've never needed it. Ditto. As another tool in the toolbox. They would not be my first or second choice. Yes. No. No. Yes. No. Yes. Yes. Yes. No.


The topics you mention are maths or applied maths topics. "Data Science" is a bubbly term that roughly means "take that big dump of data and give me some advice on how to make more money", so your list, very sadly, has little relevance with it.


Most of those topics I listed are supposed to be good at taking data and saying how to "make more money"!


I've seen in other threads you recommended Neveu's book to cover some probability theory topics. Care to explain whether Halmos & Rudin would be sufficient pre-requisites?


Halmos Measure Theory is a good prerequisite to Neveu. Rudin, Principles is a bit too little. Instead, the first half, the real half of Rudin's Real and Complex Analysis is a good prerequisite. So, is Royden's Real Analysis.

Neveu is elegant beyond belief, but Breiman, Probability, the SIAM book, available in paperback, is darned good, usually easier than Neveu, less elegant, closer to applications, and without some of the special Tulcea material in the back of Neveu. K. L. Chung also has a good, comparable book. Even if want Neveu to be your main probability book, which is fine, likely you should have alternative treatments.

Of course, there is Loeve, Probability -- written in English but somehow sounding like French. It has a lot, a little too much, but I liked the topics I studied in it. It turns out, Neveu and Breiman were both Loeve students.

Halmos, Measure Theory, is darned fun to read: It has the three series theorem and a famous exercise on regular conditional probabilities.

I learned the stuff from a course by A. Karr, a star student of E. Cinlar. Karr's course was the best course of any kind I ever took in school. Powerful material, beautifully presented, each day it was a shame to erase the board.

The exercises in Neveu are usually harder than the ones in Halmos, Breiman, and Chung.

Neveu makes probability a crown jewel of civilization.

The summer after Karr's course, I sat in the library for six weeks and walked out with a 50 page manuscript that was all the research and the first draft of my dissertation. Net, probability at the level of Neveu is darned powerful stuff, makes a lot in research, and research for applications, really easy -- that is, you really know just what the heck you are doing and can knock off new results having fun sitting in bed next to your wife while she watches TV (warning -- not gender neutral!).

What I've outlined is sometimes just called graduate probability. The biggest difference is that the whole subject makes daily use of measure theory.

I don't know how much you need in probability before starting on graduate probability. In my case, graduate probability was my first serious study of probability, and I never felt that I was not prepared.

But in my career I'd done a lot of practical work in both probability and statistics -- e.g., multivariate statistics, hypothesis testing, stochastic processes, digital filtering, the fast Fourier transform, beam forming (a case of antenna theory), power spectral estimation (US Navy sonar type stuff), how to get the central limit theorem out of digital filtering, and more, random number generation, etc. That work was plenty of intuitive background for graduate probability.

But in much of that work I was struggling due to what, really, at that level, is commonly weak basic knowledge of probability. So, after those struggles, seeing graduate probability be all clean and powerful was great.

I can't advise on just how much elementary probability you might need to have enough intuition to be comfortable with graduate probability. I will say, you do need both the intuitive experience and also the solid math.

I feel sorry for people who work in prob/stat without a background in grad prob: The elementary stuff is too often just confused from poor understanding from a poor background.

The sources I mentioned above were really the first sources from which I did any real study. Net, the elementary material of prob/stat is really too simple to be taken very seriously. So, for your first serious effort, just go for graduate probability from the sources above.

The Neveu, etc., material is much of the foundation for the secret sauce of my startup.


Thanks for the insights. Chung seems quite doable at my current level. I skimmed through it sometime ago. I borrowed a copy of Neveu and it seemed a bit harder.

Care to share other references you like. Real & complex analysis and algebra, in particular, are most welcome.


> Real & complex analysis and algebra,

I've mentioned books I've spent at least some significant time with.

There are lots more books on my shelves that look good, have good recommendations, etc. but I haven't paid much attention to.

My interest in algebra is a bit meager -- I'm not seriously interested in number theory, algebraic geometry, algebraic topology, etc.

For real analysis, the books I mentioned seem to me to provide really good sources. Of course there is much more to analysis, e.g., functional analysis. And there's a lot to stochastic processes. And much more to math.


If you want to dip your toes in algebraic geometry and functional analysis, you could do a lot worse than Lang's book on SL(2,R) for the former and Bollobas' for the latter.

cf. http://maths-magic.ac.uk/course.php?id=339


Thanks




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: