If the set of data did contain 3e519 entries then yes it certainly would generate collisions, however it you look at a more restrictive set of data, lets say 5 emails per person alive then you're looking at about 2^35 email addresses which could easily be hashed by MD5 with out a significant chance of collision.
Instead of an MD5 they could just as easily upload a bloomfilter which would expose even less data and would compress it significantly, however it would be more computationally expensive to generate matches that way vs. hashing.
Instead of an MD5 they could just as easily upload a bloomfilter which would expose even less data and would compress it significantly, however it would be more computationally expensive to generate matches that way vs. hashing.