Preface: I will joining Path this Summer, but I do not speak for the company in ...

jwegan · on Feb 8, 2012

That is incorrect. SHA1 still has no known collisions despite years of research and computing power dedicated to finding just one collision.

Edit: Furthermore since the set of valid emails and phone numbers is a very restricted set of input, it is extremely likely that there are literally no two valid email/phone numbers that SHA1 hash to the same value.

hythloday · on Feb 8, 2012

I agree that it's practically not a concern, but the local part of an email address[0] is up to 64 characters in an alphabet of size 72, and the domain part is 253+ characters in an alphabet of size 38, giving the valid email space a size of greater than 3e519, which is enough to guarantee collisions in SHA-512 and all of the SHA-3 finalists.

[0] http://tools.ietf.org/html/rfc3696

_3u10 · on Feb 8, 2012

If the set of data did contain 3e519 entries then yes it certainly would generate collisions, however it you look at a more restrictive set of data, lets say 5 emails per person alive then you're looking at about 2^35 email addresses which could easily be hashed by MD5 with out a significant chance of collision.

Instead of an MD5 they could just as easily upload a bloomfilter which would expose even less data and would compress it significantly, however it would be more computationally expensive to generate matches that way vs. hashing.

foobarbazetc · on Feb 9, 2012

That's irrelevant, since this isn't crypto.

This is matching user email addresses so they can spam you and your friends and grow their company on the back of dodgy practices.

Me1000 · on Feb 8, 2012

That's a fair assessment,

erikpukinskis · on Feb 8, 2012

It doesn't matter. The purpose of the hash isn't to uniquely identify users, it's to narrow the list of users that need to be sent down to the phone. If Path could send their entire user database to the phone, they wouldn't need to send the contacts to their server.

benwerd · on Feb 8, 2012

Replying to Me1000, with hashes it would still be trivial to create a system that determines when a friend's signed up, and alerts you. That doesn't require a full address book entry - or any contact details at all beyond an identifier. It's not like Path is actually notifying using any of the contact details it stores, which means it's either representative of wasteful coding on their side, or of something else going on.

Me1000 · on Feb 8, 2012

The main purpose is to send you a push notification when your friend joins, since a push notification can't execute client side code... that wouldn't work.

mikeash · on Feb 8, 2012

You can guarantee a unique hash for all practical purposes. If you manage to find a collision in a robust cryptographic hash then the world of computing will have far more important things to worry about than a social network getting slightly confused.

hythloday · on Feb 8, 2012

Well, this isn't quite right--the domain of the hash function is a numerical representation of the user, maybe a 64-bit int, so it's obvious that you can engineer a non-colliding hash (trivially: the hash function is XOR). What's more interesting is whether there's a hash function such that Path can't infer the social graph from user requests without the user's permission. It seems to me a hash would be both one-way (obviously) and dense (so that a randomly generated request from a user would have a good chance of matching another user).

I can't think of a hash function without a good public-key infrastructure, which is obviously beyond Path's remit. Anyone aware of a solution to this?

jakubw · on Feb 8, 2012

You could just use SHA-256. If it does collide then Path's malfunctioning will be the last of our concerns. :-)