An off-topic question: I've studied statistics in my course (engineering) but I ...

drostie · on March 12, 2012

That's a little bit harder for me to answer, because I'm not familiar with anything which explained it to me the way that I presently understand it. Most of the really insightful books start with sigma-algebras and Borel sets, which are a little hard to understand at first and then get promptly ignored for most of the rest of the book. Basically, in some proofs you need to say that a statement is "almost surely" true because you can always add outcomes to which you assign probability 0, and you can often use those to 'technically' break a theorem.

I would say that the most key ideas for an engineer to know about probability and statistics are: (1) continuous random variables [i.e. a probability density f so that Pr(x < X < x + dx) = f(x) dx], and (2) the Dirac delta-function, which allows all of the statements about continuous random variables to carry over to discrete random variables and half-discrete half-continuous random variables and all of that stuff.

Once you know those, you can start to define mean and variance and you can begin to get a handle on independence [f(x, y) = g(x) h(y)] and how to add two random variables [int dx f(x, z - x) gives a distribution for Z = X + Y].

Most importantly, you get as a near-freebie this important theorem, which I very rarely see in the textbooks. It allows you to construct arbitrary random variables with Math.random(). let F⁻¹(p) be the inverse cumulative distribution function for a density f(x). Let U be uniformly chosen on [0, 1]. Then F⁻¹(U) is distributed according to f(x), Proof: because F is always-increasing, it the inequality x < F⁻¹(U) < x + dx is the same as the inequality F(x) < U < F(x + dx). Therefore Pr(x < F⁻¹(U) < x + dx) = Pr(F(x) < U < F(x + dx)) = F(x + dx) - F(x), due to properties of uniform distributions and the fact that F() is in both cases on [0, 1]. For vanishing dx, F(x + dx) - F(x) = f(x) dx, QED.

This actually also helps when you realize that U doesn't have to be chosen just once; you can also have a uniform sampling of (0, 1), and under the transform F⁻¹(p) that sampling will have density f(x). So if you wanted a density defined on [0, Z] for which asymptotically, f(10 x) = 0.1 f(x), but you also wanted it to be evenly spaced for x < b, then you might want density f(x) ~ 1 / (b + x), x > 0. Then you have F(x) = log(1 + x/b) / log(1 + Z/b), and inverting this gives x(F) = b [(1 + Z/b)^F - 1].

That's the function you would use to create a lattice of points distributed with this density, plugging in F = k/N for k = 0, 1, ..., N. It's a very useful theorem. ^_^

Once you can do that sort of stuff, your textbook should return to the basics of discrete events: Bernoulli trials (aka weighted-coin flips where heads is 1 and tails is 0), Geometric variables (the number of Bernoulli trials before you get a 1), Binomial variables (the sum of N Bernoulli trials), and their continuous limits (exponential, Poisson, Normal).

Don't get too caught up, I'd caution, on Normal random variables. They're useful but the standard-normal tables come with a lot of pointless overhead. (Most of the overhead goes away when you realize that a "Z-score" is the "number of standard deviations away from the mean." -- X = mu + Z sigma. And the rest of it is looking numbers up in tables and making sure that you look for where the table says "0" so that you know what area you're actually calculated.

The reason I'm saying all of that out loud is that I don't know a textbook which will give you all of that material, sorry.

paulovsk · on March 14, 2012

Really thanks for this answer.

I didn't get very well the theorem you mention, but I'll ask a math guy friend of mine.

Many Kudos for you.