### Random Words

12/18/2013 05:05:00 PM
Tweetable
The Incidental Economist has left me thinking about password security, in a world where just about everything we do requires passwords of escalating complexity, usually requiring some combination of lowercase letters, uppercase letters, and numerals.

Basically, xkcd.com's author Randall Munroe proposes that we toss out existing password requirements, and instead assign users a fully randomized sequence of three words. I didn't really understand Munroe's math, so here's my own.

So to start, we have to get a sense of how hard it is to crack a three-randomized-word password. As it happens, English is one of the largest languages ever to exist on planet earth, clocking in with at least 140,000 different words. But, it's hard to find all those words in one place, so here's a random word generator that contains a subset of 90,000 words. That's still a lot more words than you or I know. So here's the first three words our word generator produced for me:
Reeumlnthrone Elicitation Pace
If you are curious, "reeumlnthrone" means "to enthrone again." Purely by accident, I was at the Queen's recoronation at Westminster Abbey last summer, but I suppose that a recorronation isn't quite the same thing as reeumlnthroning.

We want to know the expected number of attempts a hacker would have to make to guess our password. So our random variable $X$ is the number of attempts until the correct password is guessed. Assuming these attempts are independently distributed, then $X$ follows a geometric distribution, where the probability of the ${k}^{th}$ attempt being a success is given by
$Pr\left(X=k\right)={\left(1-p\right)}^{k-1}p,$
where $p$ is the probability of guessing the password in each (independent and identically distributed) attempt. So with 90,000 possible words, picking three at random, the probability that any particular attempt at guessing our password is successful is
$p=\frac{1}{{\left(90,000\right)}^{3}}.$
I won't do the math here, but it turns out that $X$ has an expected value given by
$E\left[X\right]=\frac{1}{p}=\frac{1}{7.29×{10}^{14}}.$
which means that on average we'd expect a hacker to have to make 729 trillion guesses to get our password. Again, I know nothing about cybersecurity. But that sounds like a lot of guesses.

A useful way to put this into perspective is to ask how long the password would have to be to achieve at least as much security as Munroe's recommendation, assuming that we fully randomize each separate characters chosen from the 26 lowercase letters, 26 uppercase letters, and 10 numerals--which are usually the only characters passwords are allowed. So, that's a total of 62 characters we are picking at random from. Let $n$ be the length of our password. Assuming a hacker attempting a sequence of independent and identically distributed guesses at our password, the probability of hacking this password is also geometrically distributed, with expected value of ${\left(62\right)}^{n}$ . Thus, we are looking for $n$ that solves
${\left(62\right)}^{n}=7.29×{10}^{14}.$
Equations like these are the reason god gave us logarithms. The answer is 8.29... but since we can't have fractions of a character in a password, that means we need nine totally random digits to match the password strength of Munroe's method. For reference, most applications impose an 8-character minimum password length.

This is where memorability comes in. There are also random password generators that do exactly what I described above, but the password looks like this:
rsYAA8UHH
I've already memorized Reeumlnthrone Elicit Pace, having typed it only once. I doubt I could possibly ever remember this second password, no matter how many times I type it.