Do you understand the DFT? The FFT just calculates that faster.
There is an intuition to it, something like this: suppose we want to calculate a 2048 element DFT. If we instead calculate a pair of 1024 element DFT's over halves of data, we have all the same high frequency information there in the two windows. What we're missing is the lower frequency: the one that goes through just one cycle over the 2048 window. But we don't need 2048 points for that; the low frequency doesn't contain that much information. The FFT reveals how it can be obtained from the two halves. The two halves have the necessary information in their lowest frequency component; we don't need to sample 2048 points of the signal; we just look into the half-sized FFT results and put that together.
I actually find that terms like "frequency" are a distraction here. Maybe I am odd, but for me the FFT is easier to understand when you think of the DFT in terms of roots of unity (i.e. nth roots of 1), and writing out the DFT as a matrix involving those roots. Basically, if you take only the first half of the even rows of the DFT matrix of size NxN, you get a DFT matrix of size (N/2)x(N/2); this is basically because if W is an Nth root of 1 then W^2 is an (N/2)th root. The other key observation is that the elements of the odd rows can be computed from the first half of the even rows by multiplying by powers of W (this is basic algebra), so from an (N/2)x(N/2) DFT matrix you can compute an NxN DFT matrix (hence the FFT).
There are two reasons I like this approach. The first is that it is the only way to understand the number-theoretic transform, which is very useful in cryptography (among other things). The second is that you can use the same reasoning to divide the DFT in other ways (e.g. into thirds, or fifths, etc.). The downside is that it is more abstract so if all you care about is signal processing you need to somehow connect abstract "roots of unity" with complex frequencies (but that is probably not too bad if you are comfortable with complex frequencies).
Roots of unity are easily related to frequencies. They are just ways of walking around the unit circle in equal steps. If we want N steps, we take the N-th power roots. These walks around the unit circle give us a discretized sine and its quadrature cosine. It's all together because rotation is multiplication, and harmonic motion is related to rotation.
There is an intuition to it, something like this: suppose we want to calculate a 2048 element DFT. If we instead calculate a pair of 1024 element DFT's over halves of data, we have all the same high frequency information there in the two windows. What we're missing is the lower frequency: the one that goes through just one cycle over the 2048 window. But we don't need 2048 points for that; the low frequency doesn't contain that much information. The FFT reveals how it can be obtained from the two halves. The two halves have the necessary information in their lowest frequency component; we don't need to sample 2048 points of the signal; we just look into the half-sized FFT results and put that together.