Section 11.6. Authentication

At some point, if we want our API to be more than just a feed multiplexer, we're going to need to allow people to write data and make changes to our dataset. In this case, we almost certainly want to be able to tell who the calling user is, verify that they're allowed to perform the action, and record who performed it.

In the olden days, way back in the 20th century, we had what some folks like to call the Internet 1.0 specifications. Early Internet (and then web) RFC documents lacked any kind of security beyond passwords sent over plain text. HTTP basic authentication requires a base64 version of your password, which is close to plain text. FTP uses plain-text passwords transmitted over regular sockets. POP and SMTP expect your password to be sent as plain text for every request.

The days of it being acceptable to blithely throw your authentication details around the network in a readable format are (mostly) long gone. While this was all good and fine when there were 100 academics swapping research papers, with today's packet sniffers and malicious users, we want to avoid giving away all of our secrets.

In addition to avoiding sending our authentication details over the wire in a plain format, we need to avoid a couple of other common attacks. If we hash our password somehow to send it, we can't use the same hash each time because someone could steal the hashed version and use it in their own requests. This is known as a simple masquerade attack. Supposing that the hashed password is hashed using some details of the request, the hash cannot just be stolen since it won't match any other set of call parameters. However, the whole message, parameters and all, can be sent again to the server by the attackerthe hash will match and the command will get executed. This is known as a replay attack.

Avoiding replay attacks is harderwe either need to use nonces or time synchronization (or both); you will have seen both of these earlier with the Atom WSSE implementation details. With nonces, we generate a random string on the client side and use it as part of our hash. The server checks if that nonce has been used before, allowing only one usage per nonce per client. The downside with this is that we need to track all the nonces on the server side. With time synchronization, the server publishes its time via some mechanism. When a client wants to send a message, it syncs its clock, generates a timestamp, and uses it in the hash. On the server side, we check the time is within an acceptable limit, then compute the hash. Messages then can't be replayed, unless they are within the allowed time window. By combining both nonces and time synchronization, we avoid having to store all nonces forever (just the length of the message validity window) and avoid the quick-replay holes of simple time synchronization (since the nonce can't be repeated in that window).

So how does all this apply to web services authentication? Well, we need to first take a step back and think about the different ways in which we could handle authentication.

11.6.1. None at All

The simplest method of all is to not allow authenticated calls. This should initially be seriously considered because an API without write access still presents a lot of value. Implementing a read-only API can be a good starting point and an effective stepping stone to providing a full read-write API at a later date.

11.6.2. Plain Text

To support authentication at the most basic level, we can ask users to send authentication details to the API in plain text. We can then easily compare it against our own copy of the password, whether stored in plain text or hashed in some way. This is great for testing, as it allows you to easily add and remove authentication details from a request by hand and clearly see what's going on.

To achieve a level of security, we can run our services over HTTPS instead of plain HTTP, which avoids giving our password away to packet sniffers. This is very effective but has a couple of problems. First, we need to implement HTTPS on the server side, which is not easy. The crypto needs of HTTPS typically make web servers CPU bound and decrease performance by around 90 percent (of course, your figures may vary wildly based on your hardware and software configuration). As such, HTTPS is pretty vulnerable to DDoS attacks because it has many clients that tie up your server CPU.

If you've implemented bulletproof HTTPS on the server side, there's still the client side to worry about. Not all clients can support HTTPS easily. Neither Perl nor PHP will support allow HTTPS calls out of the box, without openssl extensions. Java and .NET fare a little better: .NET provides SSL support out of the box, while the optional Java package javax.net.ssl can provide the needed functionality.

11.6.3. Message Authentication Code (MAC)

A MAC is a variation on a cryptographic hashing function. We pass in a message and secret key, and a MAC (or tag) is generated. This differs from a regular hashing function in its requirementsthe ability of an attacker to find collisions isn't such a big deal, but an attacker mustn't be able to find two messages that produce the same MAC for an unknown key (known as existential forgery). A function to generate MAC's also varies from digital signatures, in that both the writer and the reader share the secret key.

This sharing of the secret key can be an issue for web applications. We typically don't store the users' password in plain text, so having the users' sign their messages using their plain text password is no good to us. To get around this, we can either store the users' passwords in plain text, issue a special password to be used for signing API calls (one we have plain text access to), or move to a full-blown token system.

Even passing signed messages over HTTPS presents a significant security hole; we're asking users to present their plaintext authentication details to a third-party application. If we're allowing anybody to create an application built against our API, then we don't want to have all those applications asking for user authentication credentials. In the age of phishing, all it takes is one nicely presented application to steal user login details; token-based systems can solve this problem.

11.6.4. Token-Based Systems

If we don't want our users entering their authentication details into third-party applications, and we want to be able to share a secret key with applications, tokenbased systems provide a viable solution. The implementation can be complicated, but the basic process looks like this:

The third-party application asks the user to authenticate.
The user is sent a special URL within the web application and asked to log in in the usual manner.
Our application asks the user if he really wants to allow the third-party application to act on his behalf.
Our application generates a token code and passes it back to the third-party application.
The third-party application then uses the token to sign API method calls, generating a MAC using time synchronization and a nonce.

The token is generated uniquely, tying the user to the third-party applicationonly that application can use it and only for that user. There are several steps in this process for which the implementation details are a little hazy or complicated. The application first needs some way to launch the user into a browser session with the host application. For thirdparty web applications, this is simple, but for desktop applications it gets trickier, and for mobile applications, it's even trickier still.

Once the token has been generated, we need to pass it back to the third-party application, so we'll need some way of contacting it. This transfer contains the secret for the user, so we can't send it back over an unsecured connection. We either need to send it over HTTPS, have the thirdparty application request it back over HTTPS, or pass it back in plain text and use it in combination with some other nontransmitted secret. For the latter, we can issue separate secrets to each thirdparty application for it to use in the signature process. This works well for web-based thirdparty applications that have wellhidden source, but is no good for desktop applications, since the application's secret can be extracted from the source code.

For a secure API authentication system to be suitable for large-scale usage, we need to meet three core criteria:

We can start small and try to hit these criteria one by one. When we've satisfied them all, we're left with an API developers will want to use and users will feel safe in using. Our API will be effective, easy to build against, secure, and well monitored. We might as well pack up and go home.