We have an approach to this which doesn't modify the password hashing at all, so...

ircmaxell · on March 28, 2015

Well, let's imagine an attack scenario.

As an attacker, I get SQL access to your DB (meaning no access to the encryption key). I then download the user names, and the hashes. I then attack the hashes offline. I recover only the weakest few percent (since you're using bcrypt). But since the weakest few are those most likely to be re-used (both by different users and by a single user across sites), they are going to be both more valuable to me and easier for the next steps:

Then, I take the highest frequency passwords and the user table, and I start validating them online in your system. Now if I do that too quickly, you'll notice and I'll be shut down. And if I do that all from the same IP, I'll be shut down.

But what if I had a botnet that I could distribute the load across. What if I kept my request rate small enough to stay under the radar of even a moderate scale system.

I would expect to start seeing meaningful results within days.

If you had 1000 users, then I could surmise that you don't have much traffic, and hence keep the request rate down to perhaps 100 per day. In 10 days I'd have at least a few u/p combinations that I know for a fact worked.

If you had 1000000 users, I could ramp it up quite a bit higher, to perhaps 1000 or 10000 per day.

And since they all came from separate IP addresses, it could be rather difficult for you to tell an attack was going on unless you were looking specifically for it.

Does that mean you should stop immediately? No. It's not that bad of a scheme. But be aware that it doesn't give you (or your users) the level of protection that it may look like on the surface.

ntucker · on March 28, 2015

So after a breach, you have our (currently) 2 million hashes, and let's say you recover only the weakest few percent of the passwords, which is 60000 known good passwords. Instead of owning 60000 accounts now, you have 60000 passwords, each of which is going to require on average one million attempts before you guess the correct username. Is this not self-evidently better?

ircmaxell · on March 28, 2015

Well, let's look at it realistically: http://arstechnica.com/security/2015/01/yes-123456-is-the-mo...

The #1 password out of 3.3 million was 123456, which was used 20,000 times.

So extrapolating that for your 2 million hashes, we'd expect the top password to appear roughly 12,000 times.

Running those numbers, we'd expect each guess to have a 1/12000 chance of matching. Or more specifically, a 1988000/2000000 of not matching.

With some quick running of those numbers, we'd expect a 50% chance of finding a match after trying just 115 random usernames.

I'm not saying it isn't an interesting approach, I just don't think it's nearly as effective as if you encrypt the hash directly (which has no attack vector unless you can get the key).

e12e · on March 28, 2015

You forget the cases where username=password (or username+CelverP3rmutation)=password).

kybernetikos · on March 27, 2015

> you won't know what usernames they go with

But if you've got a list of all usernames (probably a relatively small number) and access to a running system, isn't it easy to just try each password against each user until you find a match?

The common practice of limiting logins from a single username wouldn't help with that either.

ntucker · on March 27, 2015

Yes, but we currently have 2.1 million users, so that's still no small burden. And don't forget that's only after you brute forced the passwords.

kybernetikos · on March 28, 2015

It's not the security of the passwords I'm thinking about, it's whether or not you've really enhanced it much by obfuscating the relationship between password and username. I'm sure you've thought about this more than me, but I'm a bit skeptical, since if you have a lot of users, you probably have had to create systems to make the log in process extremely efficient. If all of your users wanted to log into your system within a 24hr period, could they? Maybe a week? If they could, then an attacker can attempt to log in with each username over the same period of time.

kijin · on March 28, 2015

So you basically increased the cracker's workload by six orders of magnitude, which is equivalent to increasing bcrypt's work factor by 20. Cool!

im2w1l · on March 28, 2015

If you increase bcrypt's workfactor then legitimate requests take longer.

mirashii · on March 27, 2015

Is there some reason that you don't just hash the whole column or table though? It does everything you described, except also makes even the bcrypted passwords available in the event of a db dump. It just seems like a half-measure where a whole-measure is just as easy to implement.

ntucker · on March 28, 2015

Hmm, interesting suggestion. We wouldn't want to encrypt the entire row since we want to be able to query on specific columns (version in particular). My only excuse otherwise is that the goal was "disassociate the hashes from the users" not "encrypt the hashes." :) You're right that we could have done more, although whether that makes meeting the original goal a "half measure" depends on your perspective. :)

simcop2387 · on March 27, 2015

That's a really interesting idea. I was thinking it might have performance implications but I think that will only apply to the password validation. How do you make sure the mapping key doesn't cause collisions when you roll it to reset everyone's passwords?

ntucker · on March 27, 2015

That's a great point about rolling the mapping key. We would likely be dropping all the rows in the password hash table for such an extreme event anyway, though, since otherwise there's no way to garbage collect the orphaned ones.