This was all making sense until the hashing part. To me it seemed that a slight change in recorded volume would change the magnitude and as a result the hash. Perhaps noramization over the recorded sample would help, but didn't see that. Still, I'm surprised the hash is so simple and still works.
Around the time ML/AI was starting to take off again, (I want to say 2011/2012 ish) I was doing research into applying OCR methods to recognise engineering symbols on drawings (both hand-drawn as a graphical rep and CAD as a vector rep). OCR uses locally-sensitive-hashing techniques, where minute changes, like a couple of pixels difference here and there, in theory result in similar hashes. What you really need to do is look at your windowing and overlaps and tune those to get something that actually gives you localities that are useful.
This worked quite well for me. After running a "training-set" of sorts I created a small tool that ran over a quarter-million engineering drawings to get counts of each symbol from the set. (I'm going to hand-wave some implementation complexity here but essentially) If an exact match couldn't be found, the item being searched would show with thumbnails of it's nearest neighbours off to the side, you could select if it was essentially the same as one of those. (Sort of like "is this, this person" in Google Photos and the Apple equivalent)
The next step after this POC was to expand to discover symbols used most next to other symbols to use as contextual menu items in other CAD software to speed up drawing production, since a lot of time was spent placing a symbol, then stopping to text search the item you were placing next.
Unfortunately I was retrenched soon after and didn't get to progress. A couple of years later I took this a dimension further and was prepping for a PHD that would look at this for 3D-objects and models using naively generated voxel representations. Almost by accident I found group at/backed-by DARPA had a patent pending on a similar method. In retrospect I should have just gone all in on a photogrammetry based method since that kind of won out as a superior method - but it was still early days.
The moral of the story is that everything is pattern recognition and simple methods from before this last decade of ML/AI could do some cool stuff too.
that's basically what it does, it finds peaks relative to total power.
you can think of it as like identifying continents by constellations of mountain tops which is robust to whatever chaos might be happening at sea level.