Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We have an approach to this which doesn't modify the password hashing at all, so it can't possibly reduce the strength: we store users and password hashes (currently bcrypt * ) in two separate tables, and the key used to look up a given user's password hash is an encrypted version of their user id. The encryption key for doing this transformation is loaded dynamically at app startup and has some extra security precautions around storing it, so it's not baked into either the source code or a config file.

The upshot is that if you get a db dump and are able to brute force some bcrypt hashes, you won't know what usernames they go with. If you get a db dump and our source code, you're still out of luck. If you get ahold of an old server hard drive, you're out of luck. If you root a running server and inspect the process memory, you can obtain this key.

This scheme also allows the mapping key to be rolled, which would immediately invalidate all passwords in the system.

*we also version our password hash records so we can migrate from bcrypt to a new scheme fairly painlessly if it's warranted in the future.



Well, let's imagine an attack scenario.

As an attacker, I get SQL access to your DB (meaning no access to the encryption key). I then download the user names, and the hashes. I then attack the hashes offline. I recover only the weakest few percent (since you're using bcrypt). But since the weakest few are those most likely to be re-used (both by different users and by a single user across sites), they are going to be both more valuable to me and easier for the next steps:

Then, I take the highest frequency passwords and the user table, and I start validating them online in your system. Now if I do that too quickly, you'll notice and I'll be shut down. And if I do that all from the same IP, I'll be shut down.

But what if I had a botnet that I could distribute the load across. What if I kept my request rate small enough to stay under the radar of even a moderate scale system.

I would expect to start seeing meaningful results within days.

If you had 1000 users, then I could surmise that you don't have much traffic, and hence keep the request rate down to perhaps 100 per day. In 10 days I'd have at least a few u/p combinations that I know for a fact worked.

If you had 1000000 users, I could ramp it up quite a bit higher, to perhaps 1000 or 10000 per day.

And since they all came from separate IP addresses, it could be rather difficult for you to tell an attack was going on unless you were looking specifically for it.

Does that mean you should stop immediately? No. It's not that bad of a scheme. But be aware that it doesn't give you (or your users) the level of protection that it may look like on the surface.


So after a breach, you have our (currently) 2 million hashes, and let's say you recover only the weakest few percent of the passwords, which is 60000 known good passwords. Instead of owning 60000 accounts now, you have 60000 passwords, each of which is going to require on average one million attempts before you guess the correct username. Is this not self-evidently better?


Well, let's look at it realistically: http://arstechnica.com/security/2015/01/yes-123456-is-the-mo...

The #1 password out of 3.3 million was 123456, which was used 20,000 times.

So extrapolating that for your 2 million hashes, we'd expect the top password to appear roughly 12,000 times.

Running those numbers, we'd expect each guess to have a 1/12000 chance of matching. Or more specifically, a 1988000/2000000 of not matching.

With some quick running of those numbers, we'd expect a 50% chance of finding a match after trying just 115 random usernames.

I'm not saying it isn't an interesting approach, I just don't think it's nearly as effective as if you encrypt the hash directly (which has no attack vector unless you can get the key).


You forget the cases where username=password (or username+CelverP3rmutation)=password).


> you won't know what usernames they go with

But if you've got a list of all usernames (probably a relatively small number) and access to a running system, isn't it easy to just try each password against each user until you find a match?

The common practice of limiting logins from a single username wouldn't help with that either.


Yes, but we currently have 2.1 million users, so that's still no small burden. And don't forget that's only after you brute forced the passwords.


It's not the security of the passwords I'm thinking about, it's whether or not you've really enhanced it much by obfuscating the relationship between password and username. I'm sure you've thought about this more than me, but I'm a bit skeptical, since if you have a lot of users, you probably have had to create systems to make the log in process extremely efficient. If all of your users wanted to log into your system within a 24hr period, could they? Maybe a week? If they could, then an attacker can attempt to log in with each username over the same period of time.


So you basically increased the cracker's workload by six orders of magnitude, which is equivalent to increasing bcrypt's work factor by 20. Cool!


If you increase bcrypt's workfactor then legitimate requests take longer.


Is there some reason that you don't just hash the whole column or table though? It does everything you described, except also makes even the bcrypted passwords available in the event of a db dump. It just seems like a half-measure where a whole-measure is just as easy to implement.


Hmm, interesting suggestion. We wouldn't want to encrypt the entire row since we want to be able to query on specific columns (version in particular). My only excuse otherwise is that the goal was "disassociate the hashes from the users" not "encrypt the hashes." :) You're right that we could have done more, although whether that makes meeting the original goal a "half measure" depends on your perspective. :)


That's a really interesting idea. I was thinking it might have performance implications but I think that will only apply to the password validation. How do you make sure the mapping key doesn't cause collisions when you roll it to reset everyone's passwords?


That's a great point about rolling the mapping key. We would likely be dropping all the rows in the password hash table for such an extreme event anyway, though, since otherwise there's no way to garbage collect the orphaned ones.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: