I've always wanted to deeply, fundamentally understand a kalman filter, but I us...

jtolmar · on March 6, 2021

For an intuitive understanding, try to build one in one dimension. Imagine you have a position that's drifting at a roughly constant, but noisy, rate, modeled by a normal distribution. You also have an occasional 1D GPS reading about that position. As building blocks, check out conjugate priors, specifically normal-normal [1] and the algebra of random variables, specifically the sum of normally distributed variables [2].

Briefly:

If you have a normal distribution as a prior and a normal distribution as new evidence, then applying Bayes Theorem on them results in a new normal distribution (times a constant we'll ignore). The calculation can actually be pretty simple - the mean is a weighted average, weighted by precision (precision is 1/variance), and the precision is the sum of the two precisions.

If you have two normal distributions modeling unknown variables a and b, and you know x=a+b, then you can model your knowledge of x as another normal distribution. The calculation for this one is also simple - its mean is the sum of the two means (intuitively has to be) and its variance is the sum of the two variances (I've seen several math books say it's surprising. It's very convenient though).

So you have all the pieces you need to do (new state guess = old state knowledge + noise from passage of time) and (new state knowledge = new state guess | new observation), where all the variables are normal distributions, and the new knowledge ends up being a normal distribution too, so you can use it as the old knowledge next time you get information.

[1] https://en.wikipedia.org/wiki/Conjugate_prior#When_likelihoo...

[2] https://en.wikipedia.org/wiki/Sum_of_normally_distributed_ra...

kqr · on March 6, 2021

This is a great suggestion. For any topics like these, starting in very low dimensional space and working through an example is a fantastic way to understand it.

I'd also suggest not only trying to imagine this in your head: implement it in code. That's how I truly learn these types of things.

For extra fun, you can make it into a game. After you've implemented the filter, create a way for a human to input a guess at the filtered output. Score it depending on how close it is. This way you get to practise your intuition for it too.

RogerL · on March 6, 2021

At the risk of sounding pompous and self-serving, try my book, I tried really hard to give the plain language explanation, and to build knowledge through intuition and thought experiments.

https://github.com/rlabbe/Kalman-and-Bayesian-Filters-in-Pyt...

shorts_theory · on March 6, 2021

This was a great resource when I was learning about the KF, EKF, and UKF last year :-) Thanks a lot for making this!

bfrog · on March 6, 2021

I think this is what I've always been looking for, the notation used to describe kalman filters is dense, this looks like my kind of read, thank you for sharing!

jvanderbot · on March 6, 2021

The only way is to derive it.

The KF maximizes the likelihood of the observations..

You can easily form the optimization problem, "what state x maximizes the likelihood of my observations z". I had a writeup to this effect in my class notes from Prof Roumeliotis' class. I'll see if I can post it, but you could probably derive it following the KF wiki page.

_Microft · on March 6, 2021

Ask me a question or two about particular math notations and I will try to answer it.

I'll just go ahead and explain indices and summation in the meantime already because these are so super-common that you basically cannot read math things without knowing them. I will denote things LaTeX-like, that is subscript with _ and superscript with ^, if multiple things go in there, they will be enclosed in {}.

---

First: indices, usually written as sub-script, are super-common. They might be separated by a comma if there are more than one (but they do not have to be. In that case, you need to check which variables make sense as indices.) Usually variables are just single letters and not superWordyDescriptionsOfThingsBecauseCodeCompletionIsDifficultWithPenAndPaper.

Examples:

a_i is like a[i]

m_{ij} is element at position i, j in matrix M, i.e. ~ M[i][j]

---

Summation is also very common, denoted via symbol Σ . That's the upper-case greek letter "sigma". This one will have a few things accompanying it: below it is a variable name and most likely an assignment of a value to it. That's the first value that the variable will take. Above, the last value the variable should have. The expression behind the symbol is evaluated for each of these values and all integers in between and summed up. (It might also appear with just the varaible name below it, in which case it means: sum over all values of the variable).

Example: Summing up the squares of all integers from 3 to 6

Σ_{i=3}^6 i^2

(Properly rendered here: https://wikimedia.org/api/rest_v1/media/math/render/svg/a60f... )

Pseudocode for this is:

  for(i = 3; i <= 6; i++)
    sum += i^2

---

Multiplication: there is also a similiar one for multiplication that I will include because it is so easily explained when having discussed summation already. It's denoted by the greek upper-case letter Π (Pi)

Π works like Σ but the results of the expressions are multiplied instead.

Example: Π_{i=3}^6 i^2 = 3^2 * 4^2 * 5^2 * 6^2

Pseudocode:

  for(i = 3; i <= 6; i++)
    product *= i^2

This is only meant to help you translate a few formulas into forms you might easier work with. Following a university math course for a single semester or maybe two might get you a long way as well.