Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A wonderful article, thank you.

Rather than using multiple hash functions, would it make more sense to use a single algorithm over (prefix | input), with k different prefixes? This may allow computing those hashes in parallel, using SIMD for example, and caching the prehash state of the prefixes.

Edit: looks like there has been some research on this: https://ieeexplore.ieee.org/document/8462781



It’s fairly common to use two hash functions and then compute the remaining n hashes as hn(x) = h1(x) + n * h2(x).

Paper: https://link.springer.com/chapter/10.1007/11841036_42


There's a lot of optimizations you can do on top of the classic Bloom filter. In short, you can use a single hash to compute an offset into a table of bit patterns. SIMD lets you perform multiple lookups in parallel. I wrote a blog post about more advanced Bloom filters if you're curious:

https://save-buffer.github.io/bloom_filter.html




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: