Preface: I will joining Path this Summer, but I do not speak for the company in any way, nor have I spoken with them about the situation. This is a purely technical reply...
You cant guarantee a unique hash. When you hash users' data there is the possibility of collision; this probability grows with every new user. Without identifying data of some sort, it's difficult (impossible?) to get the exact user.
That is incorrect. SHA1 still has no known collisions despite years of research and computing power dedicated to finding just one collision.
Edit: Furthermore since the set of valid emails and phone numbers is a very restricted set of input, it is extremely likely that there are literally no two valid email/phone numbers that SHA1 hash to the same value.
I agree that it's practically not a concern, but the local part of an email address[0] is up to 64 characters in an alphabet of size 72, and the domain part is 253+ characters in an alphabet of size 38, giving the valid email space a size of greater than 3e519, which is enough to guarantee collisions in SHA-512 and all of the SHA-3 finalists.
If the set of data did contain 3e519 entries then yes it certainly would generate collisions, however it you look at a more restrictive set of data, lets say 5 emails per person alive then you're looking at about 2^35 email addresses which could easily be hashed by MD5 with out a significant chance of collision.
Instead of an MD5 they could just as easily upload a bloomfilter which would expose even less data and would compress it significantly, however it would be more computationally expensive to generate matches that way vs. hashing.
It doesn't matter. The purpose of the hash isn't to uniquely identify users, it's to narrow the list of users that need to be sent down to the phone. If Path could send their entire user database to the phone, they wouldn't need to send the contacts to their server.
Replying to Me1000, with hashes it would still be trivial to create a system that determines when a friend's signed up, and alerts you. That doesn't require a full address book entry - or any contact details at all beyond an identifier. It's not like Path is actually notifying using any of the contact details it stores, which means it's either representative of wasteful coding on their side, or of something else going on.
The main purpose is to send you a push notification when your friend joins, since a push notification can't execute client side code... that wouldn't work.
You can guarantee a unique hash for all practical purposes. If you manage to find a collision in a robust cryptographic hash then the world of computing will have far more important things to worry about than a social network getting slightly confused.
Well, this isn't quite right--the domain of the hash function is a numerical representation of the user, maybe a 64-bit int, so it's obvious that you can engineer a non-colliding hash (trivially: the hash function is XOR). What's more interesting is whether there's a hash function such that Path can't infer the social graph from user requests without the user's permission. It seems to me a hash would be both one-way (obviously) and dense (so that a randomly generated request from a user would have a good chance of matching another user).
I can't think of a hash function without a good public-key infrastructure, which is obviously beyond Path's remit. Anyone aware of a solution to this?
You cant guarantee a unique hash. When you hash users' data there is the possibility of collision; this probability grows with every new user. Without identifying data of some sort, it's difficult (impossible?) to get the exact user.