Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How could this work with less than 1 bit of data per key?

Assuming there are no duplicates, which is a sensible assumption, you’d need a minimum of 100,000,000 bits to store 100,000,000 unique entries larger than 1 bit with even a perfect hash function.



In general, when you're storing a list of numbers, there are many situations where you can go below 1 bit per number.

The easiest one to think about is storing the deltas between each number. Let's say 80% of your deltas are 5. If you use arithmetic encoding, then storing a 5 only takes about 1/3 of a bit. It's not hard to come up with probability distributions where the average amount of bits per entry is less than 1.

Also, back in the realm of perfect hashes, once you're more than half full it becomes more efficient to store the missing numbers. If your perfect hash has 100,003,000 possible outputs, then your worst case is around 50k unique entries. By the time you encounter 100k unique entries you only need to keep track of the 3000 you haven't seen yet.


Thank you for taking the time to explain this - it makes sense, and it’s interesting to invert the problem by storing missing numbers.


Maybe storing ranges or similar


Assuming no duplicates, the only case that would make sense would be if all but a single byte was different (sequentially across all records). Even then you’d end up with more than the number of bytes we’re talking about, even excluding the size of the index (which would be non-trivial).


Why don’t you just read what the guy said by following the links in the forum? Surely, you can find more explanation there that will answer some of your questions? Hahaha! :)


His thing has collisions, so it answers none of the questions.

Also they already did follow the link. That's why they said "they don’t actually store the keys, so the quote is misleading", which you responded to with a laugh and nothing else. And that happened many hours before you made this new comment.


Heh, yeah. "Store" can have multiple meanings.

I'm not sure that guy really understood what was going on. If he'd followed the links he would've found the code. Or at least a technical description. So why need to play dumb and ask here, while trying to control the discussion?

I don't like that kind of thing. If you're okay with it, alright. But that's not me.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: