Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
How crackers ransack passwords like “qeadzcwrsfxv1331” (arstechnica.com)
254 points by co_pl_te on May 28, 2013 | hide | past | favorite | 118 comments


Enjoyable read, but I question the bit near the end claiming that salts wouldn't help much against this kind of attack.

From my understanding, per-user salting does substantially slow down this kind of attack because it forces you to calculate a different hash for each user/plaintext combination rather than hashing a suspected plaintext once and comparing the hash against the whole list. What it doesn't slow down is the brute-force cracking of a single targeted hash.

Am I missing something there, or is the article wrong?


I'm in the middle of researching re-evaluating rainbow table attacks in light of Moore's law, GPUs and Crack (lookup) tables, I've also looked into countermeasures.

What you're describing is partially correct. When you crack passwords (either with rainbow tables or by brute force) you generate an iterator or use a dictionary and work through this generating hashes (with rainbow tables this works via a series (or chain) of what are called hash reduction functions instead of a plain iterator).

You then compare the hash you've generated against all the hashes being subjected to the crack. Where there's a match, you win!

If you're targeting a single hash and it's unsalted, rainbow tables will work as they're precomputed. It's inefficient, but it'll work.

If you're using a salt, rainbow tables need to be pre-computed for the salt - I actually did this years back with a team for oracle 10g SYS and other default user passwords as the username was used as the salt.

If you're targeting a single hash and it's salted then the iterative or dictionary approach will still work in roughly the same time as the salt is known. Separating the salt means you have to brute force it which is a PITA but generally because the way hashes tend to be stored (remember we're talking about a mechanism that shouldn't really be used anyway) the salt needs to be accessible by that system so you can dump the salt out regardless.

Depending on where my current research goes it looks like some salting mechanisms may be feasibly pre-computable now beyond those that use predictable salts. GPUs allow you to create and traverse enormous chains in rainbow tables and while having to precompute per salt will be a PITA it looks like it is possible with sufficient storage and GPU power.


If you're targeting a single hash

I was going to mention this. If your user record has a column called "IsSuperuser", then an attacker is going to concentrate on those users, since the reward for cracking them is so much higher. Individual salts or not, with multi-GPU based brute-forcing, they won't stand long.


Indeed, the best resource I've found for adversely affecting brute-force password cracking comes from the *coin cryptocurrencies.

GPUs allow us to generate much much longer chains in rainbow tables than before. This means that there's a re-evaluation of space trade-off based on chain length, so simple salting mechanisms (e.g. 4 char alphanumeric such as Cisco Type 5) become feasible. For now they're not incredibly practical for full mixalphanum 1-8 at 99.99% probability but it looks possible.


Since rainbow tables are now functionally obsolete, perhaps the best use of a salt is not to add randomness, but to enforce a minimum length. So if your salt is 40 chars, even if a user puts in a one-character-long password they're still at a length of 41 and will get some protection.

Combine that with a skip + take algorithm (not just appending the salt and password together) where you interleave the salt and password, and you might have some improved defense.


Nope, a salt is public. Its only purpose is to force you to brute force one hash at a time. It might also protect you against some attacks particular to your hashing algorithm.

It does not slow down the brute force itself. A lonely salted one letter password is as weak as an unsalted one.


Given that an attacker is going to focus on high-value accounts if they can, slowing down a brute force attack on a per-hashed-value basis is pretty much your only defense. Adding complexity (like you would when defending against a rainbow table attack) adds little value when defending against a GPU-based attack that can crank out a billion hashes a minute. So what you can do is add artificial length to the password + salt combination by using a very long salt. By necessity, you'd have to use a better hashing algorithm that will have a lower chance of an attacker getting lucky with a collision, and maybe is "slower" than other hash algorithms.


Nope. The correct way to do this is to use a password based key derivation function like scrypt[1] or pbkdf2[2].

[1] - https://www.tarsnap.com/scrypt.html

[2] - http://en.wikipedia.org/wiki/PBKDF2


As long as there are unsalted hashes or hashes that use predictable salting mechanisms in use, rainbow tables are far from obselete, functionally or otherwise.


Or just do it based on usernames. It would be fairly easy to generate a list of high value usernames (reporters, admins, bloggers all tend to use consistent usernames), and then concentrate on those.


why would any portion of the record be visible to anyone who has not the security to unlock the record? As in, why should the structure be public?


What if a site uses 128-bit salts generated by a good (perhaps hardware-based) RNG?


That will remove pre-computed rainbow tables from the equation.


Only really for the time being (and a considerable time to come, Quantum and Biological computing enhancements notwithstanding) but for a salt that big it makes far much more sense to use a key derivation function as you're effectively generating two keys otherwise (one user generated, the other - salt - via PRNG).


The article clearly explained that using more salts was preferable to using a global salt and did a good job of explaining why. They didn't come out and say that a per-user salt is best, but that's the natural conclusion of the argument they developed.


Maybe we read different articles, but the one I read seemed intent on strongly downplaying the amount of extra security that salting provides. It actually uses the phrase "minimal amount of protection" to describe a salt's effect on cracking attempts.

That seems like a strange choice of words when salting would've completely changed the outcome of the cracking attempts described in the article.

Edit: To be clear, I'm aware that salting doesn't protect against targeted attacks, but the attacks discussed in the article weren't targeted. One of the major reasons[1] they were able to crack 16,000 passwords in a few hours was due to the lack of salting.

[1]: The other being the choice of MD5 rather than bcrypt/scrypt/PBKDF2/etc, naturally.


Salting wouldn't have changed the outcome of the cracking attempts in the article. The primary focus was on how pre-computed hashes (rainbow tables, etc.) are no longer a tool used by most people attacking password lists because GPU-based hashers are efficient enough, the dicts long enough, and the methods of permuting the dicts (combinations, leet substitutions, markov chains, etc.) are rich enough that storing all possible hashes is actually less effective than writing an MD5 kernel for your AMD GPU and letting it go, monitoring for trends in the output.


Every time figure quoted in the article (eg "Retrieved 2700 passwords in 2 minutes 30 seconds") would have been up to 16,000x larger if a salt had been used since a separate hash would need to be computed for each password rather than just one hash to compare to all 16,000 passwords. 2.5 minutes x 16,000 would be around 28 days to compute the "first pass" alone.. much less feasible, especially as the parts that took multiple hours would take several years to finish. It's true you could still crack the same percentage of passwords eventually, but being able to crack 90% in 5 years or in 20 hours is a very different outcome.


Or 2.8 days if you go out and buy 10 GPUs for a couple of grand. And practically, how often are people really trying to crack all the hashes?


The point is that a salt would essentially give you a constant speed of cracking versus a speed that scaled with the size of the dataset you have. For example, the guy in the article cracked 10,000-ish passwords in 16 minutes - 600 passwords per minute. If he'd had a dataset of 1.6 million, he would have cracked over a million (assuming the passwords were of the same quality and he has some good way of storing them on his hardware to allow fast lookup), at a rate of 60,000 passwords per minute, all on his single GPU.

However, with a salt he'd be limited to about 1 password per every 27 minutes. And this rate would not increase as the dataset grew larger, which means that even if he had 10x the power as you suggest he could only crack about 20 passwords per hour (more or less). Thus to crack even 1% of a 1.6 million password dataset (16,000 passwords) it would take over a month of non-stop computation on 10 GPU's, versus a single GPU computing 62% (again over a million) in 16 minutes.

So even if a compromised website took a month to realize the leak, it would be able to inform it's users in time such that 99% of them would be able to change passwords and avoid their accounts being compromised in time. With no salt, no such luck since as we can see here up to 90% of the dataset would be cracked within a day.


But with a per-user salt, they would have to attack each password individually, using each of these techniques with each given salt and hope for a coincidence with the stored hash, rather than run their techniques and compare the resulting hashes to the stored ones? Something like:

Global hash:

  while(s = guess()) {
    if (md5(s,salt) in list) {
      print s;
      delete(md5(s,salt),list);
    }
  }
Unique hash:

  while(s = guess()) {
    foreach(h in list) {
      if (md5(s,salt)==h) {
        print s;
        delete(h,list);
      }
    }
  }
The former requires one md5 calculation and one lookup per guess, the latter needs n md5 calculations per guess. That certainly is quite a difference, assuming the bottleneck to be md5.


Where/how do you store the salts?


Right next to the hashed passwords. The point of salting isn't to add an additional level of secrecy, it's just to prevent the reuse of hashing work for attacking other users.


But doesn't this render the process useless? If an attacker gets access to the hashes, he also gets access to the salts.

If both hashes and salts were isolated, I suppose it would be much more secure, although maybe too slow.


Absolutely not. Like I said, the point of the salt is not to provide a "second password" that needs to be independently stolen. It's to make the results from a cracking attempt on one account useless on the password of another.

If you've read the article, you have a pretty good idea of how it works without salting. You have a list of N password hashes, and you come up with candidate passwords. You run each candidate password through the hash function, and compare the resulting hash to your list. If it matches any of them, you've got the plaintext for those user accounts. The key thing here is that you only have to hash each candidate password once, and you can compare that hash to as many accounts as you'd like.

Now, let's add in a salt. It's stored next to the password, so after stealing the password db you've got a list of password hashes, each with its own salt. In order to check a candidate password against a particular user account, you have to append that user account's salt to the candidate, then hash it and compare against the stored hash. That part isn't any slower. The problem is when you try to take that hash and compare it to other accounts. Even if two users have the same plaintext password, their password hashes will be different because they have different salts. The end result is that you have to hash each candidate password fresh for every user.


That's what I understood from the article, and why I suggested the salts and hashes should be handled separately.

If you give the attacker the salts, difficulty scales linearly. For two identical plaintexts, all that differs is the salt, but it is given to him.

If you store the hashes and salts separately (the technical details of which I know nothing about), then you augment the keyspace exponentially. For N byte salts, using P characters, you augment it by N^P. Furthermore, who forces you to append a salt? You could prepend. Or n-pend.

I'm just throwing ideas around.


The server must have access to both the salt and the hash to verify a password. Therefore, upon compromise of the server, the attacker automatically has access to both the salt and the hash. There is no way around this problem that isn't simply obfuscation.


Not necessarily. With something like a smartcard or TPM chip, one could move the hashes and salts off the server. Both would still be stored together (on the device) but they'd be separate from the server!

Edit: Or one could move just the salts into the device and store an index into them in the password file. Hashing would be carried out by the device but an attacker would only gain access to the indices. Without stealing the device itself it'd be impossible to properly salt the passwords for hashing.


If you have a dedicated cryptography module, you would be better off storing encrypted hashes, and asking the cryptography module to decrypt the hash and verify the password against it. Salting the hash (and using an expensive hash) would still provide a second layer of defence in case the module is compromised (and provide a physical rate limit to online attacks even if the server is compromised).


Hm. I suppose a hacky-ugly-cheapo way to do this would be to run a daemon under root which accesses a local sqlite db or whatnot (and that file is only reachable by root). When a web/whatever app needs to check a hash, it asks the daemon whether this user + this hash are valid. This could be done e.g. using unix sockets, and the daemon could do rate limiting (one attempt per two seconds for a specific user, etc). The exchange could also be encrypted using a pre-defined key, so having access to the socket wouldn't even let you sniff the exchange.

If a cracker gets access to the web app, they'll be able to monitor the exchange, but they'll need actual privilege escalation to be able to read/dump the whole hash database.


If the contents of the device can be dumped, the problem remains. If not, you have no copies of your database, so when (not if) the device or the server it's connected to fails, your site is down, and once you've brought it back up, all of your users must go through a password reset process.

That one server/device is now a single point of failure and a bottleneck in processing logins.

You're also still relying on the security of a computer, just a special-purpose one.


I didn't say anything about reliability or single points of failure. I merely pointed out that it was possible to separate the salt from the hashes and gain security that way. Whether this is practical or not depends on how important security is to you.

And yes, it would not be possible to dump the contents of the devices.


The proposal is fundamentally impractical, and thus not a "security gain" in any meaningful sense. It's the equivalent of preventing cipher algorithm breaks by using nothing but one-time pads.

It's also theoretically impure in any case, as you've done nothing but add an additional peripheral to the computer. You're seeking obfuscation, not real cryptographic integrity.


No, using an HSM to store a secret is done widely (in banking, for PINs), and it's entirely possible to implement them in a way where individual device failures can be mitigated.

The only issue is cost of HSMs; they're about $20k/ea right now, since there are only two significant vendors, and they're not widely used.

If someone wanted to do "HSM for general purpose web login, to eliminate the DoS potential of scrypt, and the brute force hash db problem of anything else, and the idiocy of plaintext", the price could probably drop down to $500 or less.


>It's also theoretically impure in any case, as you've done nothing but add an additional peripheral to the computer. You're seeking obfuscation, not real cryptographic integrity.

It's not obfuscation. The peripheral has a far smaller attack surface than a server. This is real security, even if it comes at a cost of reliability (though one can envision ways of fixing this, too).


It's operational/system security. That's not the same thing as cryptographic security.

Stop trying to patch a hole that isn't there. Salt is not secret data. If you want to protect the hash with secret data, take A1kmm's advice and use the smart card to encrypt it. But don't call that a salt, because it fundamentally is not one.


You can't make the difficulty scale more than linearly by number of passwords. If you could you'd drop back to tackling the problem in pieces and be back to linear scaling with the number of pieces.

What's making it hard for you to analyze this is that you're using the wrong terms. The secret stuff is password - the non-secret stuff is salt. If you want two secret bits, you want two passwords, or to break the password into two pieces for separate storage.

But after all that, you still want a salt, because you're not using it for secrecy but for ambiguation of identical plaintexts.

But splitting the password hash is a bad, or at best neutral, idea because passwords are likely to be semi-human readable and thus a guessable password that matches hash1 is likely the actual password, and will match hash2. This wouldn't be true if we used random passwords, but we don't. So splitting the hash is mostly totally ineffectual, as is having two separate hashes of the same string - the attacker usually doesn't need to examine both.


The point of the salt is to prevent precomputing the hashes (a rainbow table). Per-user salts take this a step farther, so that even if two users choose the same password, they will have different hashes.

What the article really points out is what people have been saying for a while now, that if your passwords are exposed, salting really doesn't add more than a roadbump to a cracker if you are computing your hashes with fast message-digest algorithms like MD5 or SHA.


I dunno... in the example there, I'd say that it costing 16000x as much to crack the passwords would be a little more than a "roadbump". (assuming people are renting {C,G}PUs in the cloud to perform the attacks)


A salt is not for adding extra secrecy. It's for making it take more work to solve a large number of passwords at once. It doesn't need to be stored separately to do this.

What you're thinking of is a secret-splitting scheme; where you split a secret among multiple parties, where you need the information from both parties to reconstruct the secret. That's a valid cryptographic technique, but that's not what's meant when you say "salted password".

The problem is, for passwords, that approach isn't really workable. Whatever machine is verifying the passwords will need access to the full hash to do so, so if it's compromised, you could get access to the full hash.

I suppose you could split the verification into two parts, on two separate machines. Each one would have it's own password database, with independent salts, and the machine trying to authenticate the user would request verification from each of the other machines before allowing you in. However, that would make your password system slower and more fragile; now if either of those machines goes offline, your login functionality breaks (and yes, you could do a "two out of three" system to avoid that, but now you're talking about tripling your hardware requirements just for your password verification system).

Really the best bang for your buck will come from (a) securing your system so that it doesn't reveal your password database in the first place and (b) using bcrypt, scrypt, or PBKDF2 to make brute forcing a database that is disclosed much more difficult. It's far more likely that you will build a secure system by following basic security practices like isolating services and giving each the minimum privilege that it needs, doing code review, keeping inessential services off of important machines, using prepared statements rather than building SQL queries from user-supplied strings, and so on, as well as using off-the-shelf security components that have been well tested like the above hash algorithms, than you will by designing your own password system using some novel technique to spread the secret out among multiple machines.


There actually has been research on how to split passwords across multiple servers (one example http://www.passwordresearch.com/papers/paper270.html), and RSA is currently marketing a commercial product that does this. While I agree that most people probably aren't concerned enough about password exposure to do this, there are effective solutions to accommodate those that are.


The salts provide their primary benefit (rendering rainbow tables useless) even when the attacker possesses the salts and the hashed passwords.


Without salts: Generate table of hashes vs. Plaintext for all plaintext up to certain length. Maybe use rainbow tables. Now, for each entry use this table. Time to get plaintext of n hashes is constant.

With salt: Generate table of hashes vs. Plaintext by appending salt to each possible plaintext of given length. For first entry use first salt to generate this table. Table is now useless for second salt if it is different. Time to attack n hashes scales linearly.


Salting would definitely make a difference. The point of salting is to increase the complexity and overhead of generating rainbow tables.

The first mistake here is the use of MD5. MD5 is flat out broken. Do Not use md5. Do use bcrypt.

In your db, you can store the salt right there....next to the hash...

How a salt increases the complexity is quite simple. If a attack has a giant rainbow table, the salt causes him to increase the size of his rainbow table and the space required to hold it and the pre-processing time to generate it. For example

Here is what your database might look like with a 3 byte salt

password : salt

hunter2 : 100

qwerty1 : 101

Here is what the attacks rainbow tables will have to look like

qwerty1 : 101

hunter2 : 100

qwerty1 : 100

hunter2 : 101

hunter2 : 103

qwerty1 : 102

...

..

.

That's about it. There are other purposes of the salt too but other people talked about those ones.


If anyone doesn't get the "hunter2" reference: http://www.bash.org/?244321


Right there in the same field as the password:

<salt>::<hashed_password>

The pepper on the other hand shouldn't be in the db. Hardcode that into a config file.


"Pepper"s are essentially meaningless and provide no real benefit over a salt.

And you should be using bcrypt anyway.


> "Pepper"s are essentially meaningless and provide no real benefit over a salt.

Citation needed. There appears to be a case where it could prove to be an advantage: http://security.stackexchange.com/questions/3272/password-ha...

> And you should be using bcrypt anyway.

Yeah, except bcrypt isn't always an option. Eg: on Google App Engine.


Yeah, that's why I said "essentially" instead of "totally". Writing your own MAC is not a good idea, pretty much full-stop. tptacek has talked about this[1] before. In the sort of environment that necessitates stupid hashing, if your database is owned your app is owned anyway.

If bcrypt isn't an option, you straight-up need a better platform. It's inexcusable to refuse to protect your users--if you're not using bcrypt/scrypt/PKBDF2 you're mistreating your users.

[1] - https://news.ycombinator.com/item?id=5663818


I know at least one major security company (that deals with Fortune 500 companies) is recommending having an extra key inside the app server, distinct from the database, since there are situations where someone can get the a copy of the DB and not your server.

The fact that they're big doesn't mean they're right, of course. But I know I've seen a bunch of news stories about DB leaking and I haven't seem a bunch about the source code to the website leaking. (Obviously there is some confirmation bias there.)


For sure, and I've done similar myself in the past, but let's be real here--they actually have app/DB separation in those cases. They're definitely not running on Google App Engine, where it's much more likely that they're getting data through an exploit in your app than by getting ahold of the database.

For the 99% case, 'pepper's are illusory security.

And you should still be using bcrypt. (Or PKBDF2, or scrypt, whatever. Just not something where a "pepper" actually ends up mattering.)


> And you should still be using bcrypt. (Or PKBDF2, or scrypt, whatever. Just not something where a "pepper" actually ends up mattering.)

As I said, you can't use any of these on Google App Engine. They're not provided by the platform, and you can't upload them yourself because they use C.


Right, so, like I said, don't use deficient platforms. Your users' security is more important than your ease of use.

And 'peppers' don't do anything of significant value in such an environment, either, because your app's going to get owned before your database is.


The article explains how all the passwords were revealed except for "qeadzcwrsfxv1331". Maybe I missed it but could anybody point out to me which method revealed this password? The letters seem to not form any words that I know, and although is appended with a number and consists of only lower case letters, it's still 12-letters long, longer than any brute-forced in the article. Edit: They tried all the keyboard typing patterns I suppose, I just realized the pattern of the string when typed on the keyboard.


Try looking at the pattern the letters form on the keyboard...


Good catch, but it could also simply be that the password had been collected during a previous hack.

The article mentions it at the end but I think they should have insisted more on this point: if you have a very strong password that you reuse everywhere and it gets leaked at some point it has a high probability to end up in rainbow tables everywhere and might not be more secure than "h4x0r1234".

So using hard to guess passwords is the easy part, the hard part is using different hard to guess passwords everywhere.


Could you explain the rainbow tables comment?

I would've thought it'd end up in a dictionary, not a rainbow table. (although I have to admit I've forgotten the details of how rainbow tables work)


Yeah thanks, I literally realized that as I hit reply on that comment lol


The punchline:

> The list contained 16,449 passwords converted into hashes using the MD5 cryptographic hash function.

Edit: I'm removing all my snarky nitpicking. This is a good article. Yes, the MD5 case they present is a poor case, but it's really about demonstrating the tactics of attack selection, rather than teaching someone how to make crack-resistant password schemes.


Right. The dialog with the crackers, showing how they use Markov chains, recombination of already-found passwords, and other adaptive techniques was new to me. They were able to turn up lots of seemingly obscure passwords very fast.

Basically, the strength of passwords can look really good on paper (96^12 possibilities or whatever), but in practice the entropy is much less: individual characters are not as independent as we'd like to think, and your clever word or phrase may be in a dictionary already (or may be a short Levenstein distance from something in a dictionary).


> the MD5 case they present is a poor case

If guys using vanilla hardware get that kind of success in 1 hour with MD5, you only need to increase hardware and the time required to see it's still completely doable for other hash functions.


The issue is practicality. Which is why pbkdf2 (and increasing the rounds each year) + bcrypt or scrypt is still a better option.


Noob question, but why would you use both pbkdf2 and bcrypt/scrypt? Aren't they all basically doing the same thing?


Bcrypt is an algorithm which bundles up something which pbkdf2 achieves by iterating other algorithms. They're basically the same, but you should use bcrypt. You shouldn't use both of them, I'm guessing the poster above meant / rather than +. If you want more security, increase your (b|s)crypt work factor.

If for some reason you can't bring the bcrypt code into your project, you can implement pbkdf2 using basically a loop and your stdlib's hash functions.

Scrypt takes the whole concept of placing extra demands on the computer and applies it to the RAM rather than the CPU (perhaps as well as?,) the idea being that RAM is harder to accumulate in obscene quantities than CPU power.


That's why the Blowfish algorithm is the best choice for password hashing.

The algorithm uses CPU cycles to generate the hash. So there is no way to speed it up, just by using a faster computer.


A faster computer does speed it up. It can go through all the cycles faster.

An even better option is scrypt, which is memory based, which is much harder to scale.


Which is why passwords just shouldn't be stored with simple hash functions. Even a naïve salt+iteration method would have drastically slowed the attack.


Right.

Using MD5 hashes was done as a relaxing constraint -- at the other end, the attackers had tightened constraints by getting limited time and computing power.

The goal was to show how attackers work and think.


My big takeaway from this article is that passwords, in almost any form, are a bad way to secure your information. The only acceptable way to use a password nowadays is to use a password manager to build huge passwords that a human could never remember or type in reliably. Even then, as machines get faster and crackers get smarter, these behemoth passwords will fall.

I've been using 2-factor authentication (Google Auth) lately and I'm fairly impressed with it. I only have to whip out my cell phone a couple of times a day/month so It's fairly convenient, and it seems very secure. Then again, I thought my password was fairly secure, but after reading this story and the HN comments, I can say that I'm just like everyone else when it comes to passwords. Ignorant. :/


The "exponential wall" means the longer your password is the less likely it is to fall. A 10 letter password is in a 26^10 keyspace. Add one more letter and it takes 10 times longer to crack -- assuming of course your password is not part of some of some combination of short common dictionary words.

What I find really interesting is that the same kind of attack vector (combinations of common words) is being used as the basis for some really sophisticated search techniques in Artificial Intelligence. I remember reading an abstract a few years ago from a student of Rich Korf @ UCLA. In it the authors use this kind of approach to attack the 24-Sliding-Tile Puzzle.

Here's a link; it's pretty cool! http://www.aaai.org/ocs/index.php/AAAI/AAAI11/paper/view/349...


not 10x longer, 26x longer (and that's assuming lowercase a-z only)


Even then, as machines get faster and crackers get smarter, these behemoth passwords will fall.

This is fairly trivial to show is false. A 256-bit password that can be checked at one clock-cycle per iteration with 1 million cores running at 30GHz will take 1.2e53 years to crack[1]. If you generate it by base-64 encoding a random 256-bit string you will end up with only a 12 character password (hardly a "behemoth").

[edit] It's 1.2e53 years to exhaust the search space; you can expect to crack X% of the passwords in X% of the time (or have an X% chance of cracking a single password in X% of the time)

[1] https://www.google.com/search?q=2%5E256%2F(1e6*30e6%2Fs)#scl...


One note here - base-64 encoding a 256-bit string would result in a 44 character password (43 without padding).

It would be pretty cool if a 12 character password gave you 256 bits of entropy though :)


You're right; I did 256/8 in my head and got the obviously wrong result of 8 rather than 32.


That's pretty interesting. It makes me even happier that I've switched over to using passwords like .!a7&Xn2\#ZRS]!:`\.H2j;{7 for everything (unique per-site, password manager). I'm guessing that such a password, when properly generated, isn't going to be cracked without a quantum computer...

But hey, maybe someone can prove me wrong. Anyone care to take a crack (pun intended) at 1a323a3185b1dbee3a0ba1f4c3c9f674

I'll send an entire shiny bitcoin to anyone who can get it right.


I don't understand the statement that salts get less effective after you've broken other passwords:

"But the thing about salting is this: it slows down cracking only by a multiple of the number of unique salts in a given list. That means the benefit of salting diminishes with each cracked hash."

A proper salt for user Joe's password does not have any relation to any other user's salt. Cracking Bob's password should not help you crack Joe's. Am I missing a technique that exploits one salt to attack another? Or are they assuming crappy salting methods? As in, if you have 2 bits of salt, then after the attacker has hashed your entire passwd file with those 4 salts, you might as well not have salted anything.


I think they misphrased that a little, unless as you say there really are people commonly using non-unique salts. The krux of the argument is that each cracked hash removes one hash from the list of hashes you need to check. So, say you've got 16000 uniquely salted MD5 passwords. 8000 of them are, though, salted MD5s of 'password'. If you start off by checking weak passwords, your first pass requires you to calculate 16000 hashes, but if your first candidate password is 'password', then it yields 8000 passwords. Your next pass only has to calculate 8000 hashes for each candidate password. So in that case, cracking Bob's weak password does help you crack Joe's strong one.


I think what he means is that every password you crack is one less salt you have to hash with. Once you've cracked half the passwords, there are only half as many salts you need to hash with.

However I think he's underestimating how much strength this adds, it would have delayed that 1 hour to get 62% of the passwords to probably a few thousand hours.


If the original list had x salts, and you crack half of the entries, then you only have x/2 salts left to check, hence cracking goes twice faster.


How about passwords not English words but written using Latin alphabet ? Like mer@s@nket!k$habd (Hindi for "my password") ?

Bet that would be harder to crack and still easier to remember for a multi-lingual person.

In fact if non-latin alphabet is widely supported for entering password, it would make them a bit more secure I feel.


It's dangerous to choose anything someone else might choose. My favourite technique for good passwords is to think of a memorable phrase (preferably with numbers) and take the initial letter of each word.

"I can sleep at Nandos, providing I pay $100 for that"

Ics@N,pIp$100ft


Yes, the non-latin alphabet in particular probably helps a great deal, if you use a substantial portion of it. The Hindi words probably less so, since they are just as likely to end up in dictionaries of common password fragments.


Anyone know where the article's author got the "MD5.txt" file (the list of hashes he ended up cracking)? I'd like to take a crack at this (pun intended) myself.


Here's what it says in the "How I became a password cracker" article (http://arstechnica.com/security/2013/03/how-i-became-a-passw...) on Ars from March:

"Dan suggested that, in the interest of helping me get up to speed with password cracking, I start with one particular easy-to-use forum and that I begin with "unsalted" MD5-hashed passwords, which are straightforward to crack. And then he left me to my own devices. I picked a 15,000-password file called MD5.txt, downloaded it, and moved on to picking a password cracker."

My guess is that the forum he mentions was the insidepro.com web site, since it is a popular password hash sharing site. A quick search there found several attached files named MD5.txt, but none seemed to be the right size or have the right hashes. However, a search for hashes mentioned in the article found this file (http://forum.insidepro.com/download.php?id=12783&sid=160...), which contains 16,880 (instead of the 16,449 hashes listed in the Ars article) and at least 3 of the MD5 hashes specifically mentioned in the article.

There's a good chance this is the same file or close to it, but you'd have to try matching it with more of the hashes from the article to know for sure.


Cue a test-how-strong-your-password-is service where security conscious individuals can test how their particular password stands up against these new attacks.


And have their password added to a word-list.


There is a site that pretends to do just that, though I can’t find its URL. It asks you to enter your password for strength checking, then takes you to a page saying “estimated strength: 0. Because you have typed the password into an untrusted web page, you must assume it is compromised.”


Has anyone found this? It sounds great, but I couldn't get a URL from googling the obvious stuff. Please post if you know it!


http://www.inutile.ens.fr/estatis/password-security-checker

The Terms and Conditions are worthwhile, too.

EDIT: Also, see http://www.ismytwitterpasswordsecure.com/ (needs Javascript)


Superb. Thanks!


It's not really what you described, but this is quite jolly: https://howsecureismypassword.net.


LastPass includes a "security check" that does some level of checking, but they don't appear to provide details on their "strength of your password" check - it runs in the browser so presumably they're only doing some low-level checks.


zxcvbn (https://dl.dropboxusercontent.com/u/209/zxcvbn/test/index.ht..., https://github.com/lowe/zxcvbn) basically has this aim. It might be missing some patterns exploited in attacks, but it’s extensible enough that they shouldn’t be too hard to add.


I think my favorite part of this is learning that after I finish with my bitcoin mining rig, I can use it to crack passwords. Awesome...


If it's a specialised rig then it'll only be good for MD5 I think. Most sensible websites don't use MD5 anymore.


The work function in bitcoin is based on SHA-256.

But vinhboy is almost certainly talking about some computers with some GPUs in them.


Most sensible websites don't use MD5 anymore

What makes you say that? Do you mean new sites or all site? Got anything to back it up?

I would guess most sites in existence use MD5, SHA1 or similar with 1 round, because it was (is?) very popular.


Bitcoin rigs are only good for certain mathematical functions. They are not general computing devices.


How about this - User's password is a single letter "a". I MD5 it and get a 32 char string, now I append a GUID as salt - another 32 chars - and I MD5 that 64 char string again and store it. How difficult would it be to crack it? If the cracker knew my process he might crack it, but what if the process is not known to a cracker? Also I can store the hashed string and the salt in a way that a crakcer merely by looking at the string can't figure out it's actually salted.

Say my final salted hash is 1234 and the salt I used was ABCD. I can splice the two and store it as 1A2B3C4D. That would throw a cracker who does not know how the value was generated off the hook won't it?

Unless the crakcer also gets access to my code, it will be impossible for him to find head and tail of such hashes.


Well, you're hashing and salting, which is at least a good start. But you shouldn't use MD5- if the cracker does find your secret formula, you're hosed compared to pbdfk2/bcrypt/scrypt. Not only for the speed, but because MD5 has exploits that make it easier to produce a specific desired hash than the other options, so your attacker doesn't have to make as many guesses.

You should use a per-user salt, though, instead of a static salt (the 32-char nonce). The static nonce means the attacker can generate a rainbow table during their attack of all possible passwords. The hash of the password does not count as a suitable salt, because if User A uses the password 'a', and so does User B, both users' hashes come out the same. Finding User A's password means you can compare its hash against the rest of the compromised hashes and note that you grabbed User B at the same time. With a per-user salt, the users will have different hashes despite using the same password, so the attacker has to attack each hash separately, with no "quick wins". A randomly-generated salt will do, and I expect simply prepending the username or something to the password before hashing would do as well, singe it's still unique per user.


Yes of course I meant per user salt. And if I store a 64 char string it will be very difficult for an attacker to figure out MD5 was used.

And since the string that I MD5ed is 64 chars in length (32 of original MD5 + 32 of salt) it should make it near impossible to crack.


Relying on a terrible hashing mechanism just because you hope nobody will realize you're using it is a very bad idea. There's no technical reason to use MD5 over at least looping over a good SHA variant, and if your code has access to bcrypt/scrypt, it's hard to come up with a reason to choose anything (except maybe pbkdf2) over them, either.

Further, a string that long might be effectively impossible to crack /by brute force/, but MD5 has been found to have flaws that allow the attacker to find alternate inputs that will produce the same hash in a very short amount of time (minutes to hours, according to Wikipedia[0]). That means if the attacker has your hash, they don't need to figure out your nonce or salt to come up with something they can use to trick your system into authenticating them as an arbitrary user. These attacks don't reveal the user's original password, but they do still allow someone to impersonate an arbitrary user on your site, for e.g., purchasing services, accessing financial details from your billing system, or performing social engineering attacks.

[0] http://en.wikipedia.org/wiki/Md5#Security


Well the bonehead hashcat program is limited to 15 character passwords, so it can't possibly crack mine.

The hashcat program software in general is powerful, though pretty crappy from a design standpoint. The current version crashes if any input is not what it expects, and it has just about zero help for what it expects. Actually I can't even get it to work, as executing hashcat-lite.exe with arguments causes it to just immediately exit without doing anything nor printing any output.


    *Well the bonehead hashcat program is limited to 15 character passwords, so it can't possibly crack mine.*
One of the main themes of the article is that the sites are the weak point, not necessarily the users. The first boo-boo is storing passwords in an easily discoverable format (plaintext, md5, sha1, etc.).

Another massive annoyance is when a site has an insipid constraint on passwords. No longer than 8 chars because that's what they put in their db last century or based it on passwd, minimal symbols, and basically just alpha-numerics.

That's why your password manager has the controls on length, number of symbols, digits, etc. It's not to "smarten up" your passwords but to dumb them down for sites that give no priority to security.


The best part of the article is the last part about Intel password strength "guessing".

The other day some devs on reddit were using google to find password strength and posted stuff like my password would take "1 billion years to crack", without realizing these sites have garbage algorithms for measuring strength, the Microsoft one is a joke.

If developers are falling for these strength indicators, I'm certain average users are getting a very false sense of security.


In addition to per user salt, it would be good to also add "per application" salt. A random, long string, that is stored somewhere in the application code and not in the database.

Obviously this does not help if the whole site is compromised, but it may help against SQL injection attacks where the attacker gains access to just data from database.


Maybe we should tell users to use passphrases instead.

The 8 char limit is very low now. The limite should be raised to 30 characters.


Telling people to use passphrases is a great recommendation, but you still have to spend some time teaching them how to use passphrases effectively. In the article they list several passphrases that were cracked, such as "sleepingwithsirens", "gonewiththewind1", and "momof3g8kids". So if a user chooses "ijustbluemyself" thinking that it's a great choice they are likely to be disappointed if a skilled cracker gets access to their password hash.

My basic suggestions are to choose 4+ words that are not a common phrase, song name, quote, etc. And make sure you still use both lowercase and uppercase letters, plus throw in some symbols in non obvious places (e.g., don't convert your a's into @ signs). It doesn't need to be as random as a shorter password, but it still shouldn't look like a normal sentence.


well it's either that or security cannot really be guaranteed.

Maybe one day password will just be 1MB files kept on people's computer instead of remembering a short string.


'Taekwondo1983' Was the password a fellow admin of an old forum I used to run. Wander if it was still him.


Probably not, 'Taekwondo1983' sound like a password some percentage of people who practice a certain sport and are born in a certain year would choose.


Combinator attacks guess "the square of the number of words in the dict" when the password is just the combination of two words, correct? So a 3-word pass would require dict^3? This all seems to come back to length.


Pretty much, yeah. This is really just a generalization of the principle we see in the length of a binary password. The complexity is s^l, where s is the number of symbols you can possibly draw from, and l is the number of symbols in the password.

Consider a binary string. There are two symbols in binary (1 and 0) and the number of symbols you use is the number of bits in the string. This gives us the familiar password length calculation: the complexity is 2^l.

When you start using words as the basic unit, you gain many more symbols: if you use the full Oxford English Dictionary, you have about 600,000 of them. But when you do this, your symbol is no longer a single byte: it's a word. That word can be represented as a string of bytes, yes, but it's guaranteed that this string will be one of the 600,000 byte strings that represent English words, and you can eliminate all of the others. So you can't measure your password in bytes anymore; you have to measure it in words instead. A one-word password is, therefore, of complexity (600,000^1).

Six hundred thousand possibilities sounds like a lot, but computers can crack that in seconds. Even if you restrict yourself to just printable ASCII, you can do better than that in only three characters. I don't recommend actually doing a three-character printable-ASCII password, because you'd have just under 900,000 possible passwords: still well within crackable range.

But because the complexity increases so fast as you increase the length of the password, that's how you can gain complexity back. If you use four words instead of one, you have (600,000^4) possibilities: some 130 sextillion. You'd need 70 random bits, or 11 printable ASCII characters, to do better.

And that, my friends, is why "k[0" is a better password than "antidisestablishmentarianism". But it's also why "correct horse battery staple" is better than either one (or would be, if xkcd hadn't made it so famous).


Related: Stitcher (still) stores passwords in plain text https://news.ycombinator.com/item?id=5778367


I don't know too much about crypto. Would using something like salsa20 stop this?


A password-derivation function has two requirements:

1. Obscure the password in a one-way fashion.

2. Be fast enough for humans, but very slow in computer time.

Hash functions can be used to produce an obscured copy of data, but they are also by design very fast.

If you wish to protect passwords, don't use a naked hash algorithm. Choosing a different hash algorithm doesn't fix the problem.

Instead, use a password-derivation function. These are designed both to obscure and to take significant time. Bcrypt, Scrypt and PBKDF2 are the standards.


And of these three, scrypt is by far the best choice because it's the hardest one to speed up with specialized hardware.


Right. Colin Percival did a clever thing and made it memory-hard, not just computationally hard.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: