Last updated at Wed, 06 Dec 2017 21:20:00 GMT
Synopsis
We like writing web applications. Increasingly, software that might have once run as a desktop application now runs on the web. So how do we properly authenticate users in web contexts? Passwords, of course! Let’s look at how to handle password authentication on the web.
What not to do
It’s very tempting to use plaintext authentication for web applications. Let’s whip up a quick python web server that we’ll use to test authentication. Let’s say we want to provide access to magic numbers, for users we give access to.
from flask import Flask, request
app = Flask(__name__)
# app globals
secret_number = 42
# routes
@app.route("/secret_number")
def get_secret_number():
return secret_number
if __name__ == "__main__":
app.run()
And curling yields:
$ curl 'localhost:5000/secret_number'
42
Right now, there’s no authentication. What if we want to protect the magic number with a username and password? It would be easy to do plaintext authentication, like so:
# app globals
secret_number = 42
users = [
{ 'name': 'bob', 'password': 'QNJWzjRc' }
]
# helpers
def user_allowed(username, password):
return filter(
lambda user: user['name'] == username and user['password'] == password,
users
)
# routes
@app.route("/secret_number")
def get_secret_number():
if user_allowed(
request.args.get('username'),
request.args.get('password')
):
return str(secret_number)
return 'not allowed'
Curling as we did before, we see the expected response:
$ curl 'localhost:5000/secret_number'
not allowed
However, if we pass the correct username and password arguments as URL parameters, we can still retrieve the magic number.
$ curl 'localhost:5000/secret_number?username=bob&password=QNJWzjRc'
42
Why is this a bad idea?
Despite the ease in implementing an authentication system like this, it’s really not a good idea to check passwords this way. Here’s why:
- If someone gets access to your users list, they can pretend to be an authenticated user.
Since people tend to reuse passwords across sites, a user’s passwords being read from your site might lead to a catastrophic disclosure of information on other sites! - Yet we still login to websites with passwords. There must be a better way.
Examining practices in non-technical areas, you wouldn’t tell someone your social security number, but you might give someone a piece of information that guarantees that you know your social security number, like your SSN’s last four digits. If someone wants to identify you by the last four digits of your SSN, they happily don’t need to know your full SSN.
Similarly, it would be nice if we had a way to verify that someone knows their password without actually storing their password in its entirety. That way, if an attacker reads your users list, they don’t get users’ actual passwords.
Handy One Way Functions
It turns out that we can solve these issues by employing (easy to use) cryptographic techniques. Most mathematical functions, i.e. f(x) = x + 2
have an inverse, in this case f^(x) = x - 2
. Some functions, however, such as g(x) = x mod 3
have no inverse, since multiple values in the domain of the function can map to the same output, i.e. g(1) = g(301) = 1
.
We can use similar mathematical functions to take a password and create an identifier based on it that could only be generated by processing that password. As an example, the SHA256
algorithm takes the password password123
and turns it into
ef92b778bafe771e89245b89ecbc08a44a4e166c06659911881f383d4473e94f
If we take a slightly different password, say pAssword123
and run it through SHA256
, we’ll get a completely different result:
f5355765f831ee3c9fb35e3a3c701887f6ac33a39fbad7f1740759558716fccf
The crucial realization is that we can’t easily derive the supplied data from this seemingly random string of letters and numbers. SHA256
is, as we described above, a one-way function. This makes the generated string of letters and numbers a sort of ‘fingerprint’ of the data that generates it.
You’ve probably head of other functions like this; common hash functions include the SHA
family, MD5
and Blowfish
. Going forward, we will be using special hash functions that are specifically designed to take a long time to compute.
How do we apply this to our web server?
Taking things back to code, we can use simple APIs for hash functions built into most web platforms. Flask, for example, makes use of Werkzeug’s generate_password_hash
and check_password_hash
to generate and verify hashed passwords.
Note: Some hash functions are more suitable for storing passwords than others; existing general hashing functions like MD5
or even SHA256
shouldn’t be used due the ease with which they can be computed. Password specific hash functions like bcrypt
and PBKDF2
(which in turn uses general hash functions) should be employed instead.
We’re ok using Flask’s built in API, however, which makes use of PBKDF2
. Let’s augment our previous example.
Adding an import for our hash API functions:
from flask import Flask, request
from werkzeug.security import generate_password_hash, check_password_hash
app = Flask(__name__)
We also need to store our passwords using the password’s hash, rather than the password itself:
users = [
{ 'name': 'bob', 'hash': generate_password_hash('QNJWzjRc') }
]
Note: Needless to say, we would never ever hardcode passwords in a production application.
Finally, we just need to update our user_allowed
function to check the user’s password against the stored hash:
def user_allowed(username, password):
return filter(
lambda user: user['name'] == username and \
check_password_hash(user['hash'], password),
users
)
Once again, we can successfully retrieve our magic number with the right password as before, and are disallowed from viewing the secret number otherwise. Now, however, if our user list is leaked, we don’t learn anything too useful.
(Pdb) p users
[{
'hash': 'pbkdf2:sha1:1000$WNYvjwGm$241780fe007f981b9c9959b3416693bb689b9f91',
'name': 'bob'
}]
And now we have a pretty good layer of security around our stored passwords.