Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If the set of data did contain 3e519 entries then yes it certainly would generate collisions, however it you look at a more restrictive set of data, lets say 5 emails per person alive then you're looking at about 2^35 email addresses which could easily be hashed by MD5 with out a significant chance of collision.

Instead of an MD5 they could just as easily upload a bloomfilter which would expose even less data and would compress it significantly, however it would be more computationally expensive to generate matches that way vs. hashing.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: