Separating Hyperplanes

Random Words

Matthew Martin 12/18/2013 05:05:00 PM

I know nothing about internet security, so naturally I'm taking xkcd pretty seriously.

The Incidental Economist has left me thinking about password security, in a world where just about everything we do requires passwords of escalating complexity, usually requiring some combination of lowercase letters, uppercase letters, and numerals.

Basically, xkcd.com's author Randall Munroe proposes that we toss out existing password requirements, and instead assign users a fully randomized sequence of three words. I didn't really understand Munroe's math, so here's my own.

So to start, we have to get a sense of how hard it is to crack a three-randomized-word password. As it happens, English is one of the largest languages ever to exist on planet earth, clocking in with at least 140,000 different words. But, it's hard to find all those words in one place, so here's a random word generator that contains a subset of 90,000 words. That's still a lot more words than you or I know. So here's the first three words our word generator produced for me:

Reeumlnthrone Elicitation Pace

If you are curious, "reeumlnthrone" means "to enthrone again." Purely by accident, I was at the Queen's recoronation at Westminster Abbey last summer, but I suppose that a recorronation isn't quite the same thing as reeumlnthroning.

We want to know the expected number of attempts a hacker would have to make to guess our password. So our random variable

X

is the number of attempts until the correct password is guessed. Assuming these attempts are independently distributed, then

X

follows a geometric distribution, where the probability of the

k^{t h}

attempt being a success is given by

P r (X = k) = {(1 - p)}^{k - 1} p,

where

p

is the probability of guessing the password in each (independent and identically distributed) attempt. So with 90,000 possible words, picking three at random, the probability that any particular attempt at guessing our password is successful is

p = \frac{1}{{(90, 000)}^{3}} .

I won't do the math here, but it turns out that

X

has an expected value given by

E [X] = \frac{1}{p} = \frac{1}{7.29 \times 10^{14}} .

which means that on average we'd expect a hacker to have to make 729 trillion guesses to get our password. Again, I know nothing about cybersecurity. But that sounds like a lot of guesses.

A useful way to put this into perspective is to ask how long the password would have to be to achieve at least as much security as Munroe's recommendation, assuming that we fully randomize each separate characters chosen from the 26 lowercase letters, 26 uppercase letters, and 10 numerals--which are usually the only characters passwords are allowed. So, that's a total of 62 characters we are picking at random from. Let

n

be the length of our password. Assuming a hacker attempting a sequence of independent and identically distributed guesses at our password, the probability of hacking this password is also geometrically distributed, with expected value of

{(62)}^{n}

. Thus, we are looking for

n

that solves

{(62)}^{n} = 7.29 \times 10^{14} .

Equations like these are the reason god gave us logarithms. The answer is 8.29... but since we can't have fractions of a character in a password, that means we need nine totally random digits to match the password strength of Munroe's method. For reference, most applications impose an 8-character minimum password length.

This is where memorability comes in. There are also random password generators that do exactly what I described above, but the password looks like this:

rsYAA8UHH

I've already memorized Reeumlnthrone Elicit Pace, having typed it only once. I doubt I could possibly ever remember this second password, no matter how many times I type it.

There are, of course, a lot of mathematical caveats here. For one thing, hackers don't generally make independent and identically distributed attempts to guess your password. At the very least, they'd be silly to guess the same password twice. I don't care to recalculate the probabilities assuming the hacker never guesses the same password twice--"without replacement," in probability theory lingo--but needless to say this would reduce the expected number of guesses somewhat. Second, expected value may not be the best metric of security. Another metric that comes to mind is how many attempts it would take to be guaranteed to have found the right password, which is just equal to the total number of password combinations possible. I did not do that because it is an upperbound--the absolute best case scenario--whereas what matters for security is really the lower bound--that is, whether there's a high probability that the hacker can guess the password well before then. Third, and perhaps most importantly, my analysis above was actually fairly skewed against Munroe's method. That's because I implicitly assumed that hackers have information about the password--the password consists of exactly three randomized words chosen at random from a particular database of words--that hackers in real life probably wouldn't have. In addition, I assumed that the alternative to Munroe's method is fully randomized characters, whereas in reality people tend towards relatively predictable passwords based on names, words, dates, and locations that are personally familiar to them. Thus, in real life how long it takes to guess a password depends a lot on the efficiency of the hacker's guessing algorithm.

But, I think a good rule of thumb is that a password consisting of three randomized English words is roughly equal to a fully randomized 9-character password. Of course, nothing is stopping you from picking a fourth word for your password...

Max 12/21/2013 05:20:00 PM

Hackers don't guess passwords anyway. It's much, much more important to use a different password with each service than to use strong passwords.