Rather than using multiple hash functions, would it make more sense to use a single algorithm over (prefix | input), with k different prefixes? This may allow computing those hashes in parallel, using SIMD for example, and caching the prehash state of the prefixes.
There's a lot of optimizations you can do on top of the classic Bloom filter. In short, you can use a single hash to compute an offset into a table of bit patterns. SIMD lets you perform multiple lookups in parallel. I wrote a blog post about more advanced Bloom filters if you're curious:
Rather than using multiple hash functions, would it make more sense to use a single algorithm over (prefix | input), with k different prefixes? This may allow computing those hashes in parallel, using SIMD for example, and caching the prehash state of the prefixes.
Edit: looks like there has been some research on this: https://ieeexplore.ieee.org/document/8462781