How (not) to store passwords

Validating and authenticating users of an application is one of the major areas of cybersecurity. Storing passwords correctly are of highly critical importance, yet it is an area of which many developers fail. Partly because this area can be quite technical, mathematical and extremely easy to do wrong, and partly — sad as it is — out of mere laziness.

In this article, I will go through different ways one might approach storing passwords for a website or an app. Some will be more preferable than others.

Store passwords as plaintext

So, you’re creating a login system for your website, and you need to effectively store the password. The really naive approach is, of course, the very simple one: Just store it. You have a database with a table for all your users, and it would probably look something like (id, username, password) . On user creation, you store the username and password in these fields as plaintext, and on login, you extract the row associated with the inputted username and compares the inputted password with the password from the database. If it matches, you let your user in. Perfectly simple and easy to implement.

Obviously (I hope), this is a tremendously bad and insecure way of storing passwords! If you would be so unfortunate that your database would be hacked, every single of your user's passwords would be exposed directly to the naked eye. Let’s emphasize the significance of this disaster with some facts: Most people use the same password for all (or at least multiple) services. So not only did you jeopardize the access of your own website but you also potentially revealed the passwords that your users are using elsewhere; for their personal emails or internet banks. Moreover, if you chose to store your user's passwords in plaintext, you probably didn’t care much about sanitizing your inputs or using prepared statements to prevent SQL injections. The chances of your webpage being hacked might not even be that small.

Hopefully, you’re already cringing just by reading this, and thinking to your self: “This is all just common sense, no one actually does this”. Think again! As I’m writing this article in 2019, there are still many websites out there — hell, even big companies — that stores passwords like this. If you feel intimidated by this fact, I don’t blame you. There’s a quick way that you can verify if your own password is stored in plaintext. Try using the ‘recover password’ function of the website. If you receive an email informing you what your password is, that’s because the website is storing it in plaintext, and this would be an enormous red flag!

Encrypting your passwords

The first approach to adding a layer of security on top of the passwords in your database would be to utilize encryption. Let’s quickly walk through how encryption works in a very simplified way:

A key is generated which will be used to encrypt messages.
This key will now encrypt messages in plaintext and turn them into ciphertext (gibberish).
Afterward, the same key can be used to decrypt the ciphertext, turning it into the original message.

In regards to storing passwords, the idea is to encrypt the password using the generated key before storing it in the database. Every time a user logs in, the inputted password would be encrypted using the same key, and the encrypted inputted password would now be compared to the one stored in the database. If a hacker would find their way to the database, what they would see in the password field is the ciphertext.

While this is better than storing passwords in plaintext, it is still really insecure and really bad practice. Let’s bring some more facts to the table: According to Troy Hunt, 86% of all passwords are terrible! Let’s say that I, as a hacker, do manage to break into your database. Now I would have a lot of entries with a lot of encrypted passwords. According to statistics, there’s a high probability that one of these passwords are ‘123456’ or ‘password’ or ‘abc123’ (or one of the other top 100 passwords that are commonly used). I would now try some of these, and if I would be lucky to get a match, I could easily derive the encryption key. Or even better — I would write a simple little script that will do it for me. Now I can try top 10.000 used passwords against the database in no time! Done! All it takes for me is to find one match — knowing the key, I can now effectively decrypt every single of the remaining passwords in the database.

Hashing your passwords

Instead of using encryption, you might choose to hash your passwords on creation instead. After all, in the above example, we never needed to decrypt the password after encryption. We only needed to make sure, that the same password would produce the same ciphertext every time, and then simply compare those. This is a case where a hashing algorithm becomes beneficial.

Amongst others, the benefits of using a hashing algorithm are

It is non-invertible. That is, it is not computationally realistic to reconstruct the input from the output hash.
The output hash is of fixed length. That is, the output hash will always be x characters long, whether the input is a single word or the entire collection of Shakespeare.
The smallest change of the input will result in a completely different output hash. This is also known as the avalanche effect.
The same input produces the same output every time. Hence the hashing algorithm is a function.

If you want to learn more about hashing, I recommend you to read this article by Raul Jordan.

By applying a hashing function before storing the password we can effectively circumvent the issue introduced when using encryption. But unfortunately, this is still an insecure way to store passwords!

A consequence of hashing functions, in general, is that a chance of a hash collision will always exist. That is, that two different inputs will map to the same output hash. Given the criteria number two above; that the output hash is of fixed length, it is mathematically provable, that collisions are inevitable. The mathematical principle behind is known as the pigeonhole principle.

However, a cryptographic hashing function is said to be collision resistant if it is highly unlikely to find two inputs that map to the same output hash. A good example of a known hashing algorithm that was said to be collision resistant is MD5, which was an industry standard for many years. However, in 1996 a flaw in the design of MD5 was found, and in 2004 it was proven that MD5 is not collision resistant.

Recall, that when applying hashing to the password before storing, we also only compare the hashes when authorizing the login. That means, if I can find an input that produces the same hash as the one stored in the database, I can use this input to log in.

This is why one should completely avoid using MD5 for anything regarding authentication and storing password. The same goes for the popular SHA-1, which, with today's computational resources, is also easy to break.

Fortunately, there exist more modern and much safer hashing algorithms out there today. The current industry standards are PBKDF2, [bcrypt](en.wikipedia.org/wiki/Bcrypt), and [scrypt](en.wikipedia.org/wiki/Scrypt). However, even applying the most modern and cutting edge technology in regards to hashing, it is still an insecure way of storing passwords.

Let’s recall the fact that 86% of all passwords are terrible. I, as a hacker, know that if I would hash the password ‘123456’ e.g. utilizing MD5, I would always get the same output hash. Using a list of the most commonly used password, I would iterate through these and applying commonly used hashing algorithms and store the output hashes. Next time I would want to crack a hashed password, I would simply do a lookup in this list of stored output hashes, and chances are, that I would find the original input here. In fact, I wouldn’t even have to create this list myself. People out there have already been doing this with a broad variety of different combination of hashing algorithms. These lists are called rainbow tables. For a large list of the most commonly used password, it is actually no more difficult than simply pasting the hash into Google, and you would probably get the original password within the first couple of search hits. You would also learn which hashing algorithm has been applied when performing the hashing.

Try it yourself! See if you can crack this hashed password: 482C811DA5D5B4BC6D497FFA98491E38

Hashing + salting

A way to protect against the security implications mentioned above, as by applying a salt to the password before performing hashing.

A salt) is a long, randomly generated string that is concatenated with the password. This salt is generated upon every new user creation and is stored with the username and the hashed password.

In this way, the password ‘HappyFace’ would generate a completely different hash every time it is used to create a new user. It would not match the hash of the same passwords used on any other websites, and even the same password used multiple times in the same database, would result in completely different output hashes.

In this way, I, as a hacker, would be left with only one choice: Brute-force attack. That is, simply trying every possible combination of characters until I stumble upon the correct password. If the password and salt are long enough, this is practically impossible to ever achieve.

So to conclude, if you really have to store passwords, the most secure approach is to apply hashing + salting, thus enforcing mandatory strong passwords using well-known password policies.

Don’t store passwords at all

Even though you can apply a satisfying layer of security by following the conventions above, it is still really, really easy to do this wrong! So if you absolutely have to store passwords, make sure you do proper research before beginning, and make sure to follow the latest security conventions of whichever framework or approach that you choose. By the time you’re done reading this article, chances are that something has already changed. So always be sure to look up the latest standards.

In conclusion, the best way to store passwords is to not store passwords at all!

Google, Mircosoft, Facebook, Twitter (…etc), all have good, easy-to-implement and ready-to-use solutions for authenticating users. You can be absolutely sure, that they hire the most experienced experts in this field, and that they apply all the latest, most cutting edge security technologies available, and that they will be the first to make critical changes if any violation of their security standard is reported!

I will strongly, strongly recommend to always use these 3rd party solutions to authenticate users on your website, and simply don’t store any passwords at all.

That’s it! If you have any questions or feedback, please feel free to comment below. If you liked this article, give the clap 👏 button a couple of hits! You can also find me on Twitter, where I’ll be posting more content similar to this.