What's the purpose of a username?

5/18/2015 08:01:00 PM

A project I'm working on has me thinking about security and identity management on the internet. Between all the interactive stuff and all the paywalls, just about anywhere you go on the internet you get this:

User name:
So here's my question: why do we need to enter both?

The obvious answer is that the password is what lets you into the website, but the server needs the user name to look up your password in the database. Since the username is unique, this works--the server can look up only the password for your username, compare it to what you entered, and let you in if they match. But two users are allowed to use the same password, so looking up the password directly could produce multiple matches. But what if passwords were unique?

Of course, you can't have a password-only login that tells users whether a password is in use when they pick one, because that would be giving them complete access to someone else's account. But because very large numbers exist in the universe, it is nevertheless possible to issue passwords that are both totally random and totally unique. Programmers do it all the time. For example in C#, the statement Guid.NewGuid(); generates a "globally unique identifier," by which they mean a really big random number reinterpreted as a string of letters and numbers.1 Sure, you'd have to store all these passwords in a password manager--you can't memorize them--but really who can memorize all of their passwords? If you aren't already using a password manager, you might as well consider all your accounts already compromised. And with the manager, it doesn't matter if you get to pick your own password. And with a manager, your life is the same whether those passwords are 8 characters or 80 characters long.

Now, you don't just store a list of user's passwords in your database. Seriously guys, don't do that, regardless of what systems you use. Hackers will steal your database and have access to all the accounts. Instead, websites store cryptographic hashes of passwords in their database. Hashing is the process of converting some text into a number, and we're using a cryptographic hash function that is asymmetric--it can convert text to hashes, but can't convert hashes back to text, so if the hash is stolen the thief still won't be able to sign in to any of the accounts because he'd still need the unhashed versions of the passwords to do so. But that's ok because the hash function is also injective, so if two hashes match, then they must necessarily be the same password. We can, therefore, hash the password that someone enters into the login page, then look for an entry in the database that matches that hash.

That is harder than it sounds. You can't just do a simple select...where lookup for the hashed input. That's because we don't store raw cryptographic hashes of the passwords in the database either. Seriously guys, don't store the raw hashes. It turns out that using a rainbow table with all the usual hashing algorithms, it's still relatively easy to decrypt hashed passwords and gain access to the accounts. No, instead we only hash salted passwords. Salt refers to the technique of corrupting the passwords with randomized data so that all of the hashed passwords appear to be the same length and all appear to be randomized text, offering no clues for decryption. The salt is added in such a way that we--the developers of the hashing algorithm--can unsalt the hashes so that we can compare them to the hashes of user input on the login page. Unfortunately, however, this means that the only way to locate a particular password in a database is to compare it's hash to all of the passwords in the database, unsalting each one-by-one until a match is found. It's worse than that, because we can't merely reject a database entry as soon as it is found to not match the user's input, because attackers can measure the amount of time it takes to reject to gain information about the passwords stored in the database. No, we have no choice but to iterate through every single part of every single password--the most time intensive method possible.

It turns out that even if we used globally unique and random passwords, salt makes a password-only login system fairly unworkable. Usernames actually provide two functions: They are globally unique identifiers, and they are also unsalted keys for fast database lookup.

I'm left to wonder why we treat usernames as if they are public information, not only stored unencrypted on the server, but often even published publicly to all users on the site. Yet, because their second, less obvious function of providing fast database lookup, knowledge of usernames represents real power for attackers, and a real security vulnerability for users. We can't salt usernames, but we can hide them from public view and keep them stored as hashes rather than plain text. In fact, we don't need to force users to manage their own usernames. Instead all we really need is to implement password rules that require part of the user's password to be unique--perhaps just the first four digits of a 12 or more character password--and to store the hash of that portion unsalted. From the user's perspective, they now only need to enter one credential to access their account.

1 Sure, in theory the random number generator could repeat itself--maybe the Imperator Intergalacticus a million years from now will need to promulgate an alternative IGuid method to ensure inter-galactically unique identification across Earth's sprawling billion-galaxy empire, but until then the probability of repeats is zero.

Max 5/20/2015 06:50:00 PM
This makes me think of the legendary "numbered Swiss bank account". The number is secret and functions as both user ID and password (well, I could be wrong about that, it's just my understanding from watching movies...)

If you want, it's easy to generate random passwords that are potentially memorizable: