11.5. API Abuse
As with any web-based system, APIs are sometimes a target for abuse. This can be intentional, but also more easily unintentional. Naive programmers using your services can easily cause an application to start hitting your services many times a second, bringing your service to its knees. Unlike the web pages of your application, API methods tend to always be process-intensive because they each perform a specific task rather than contribute to user dialog. Because APIs are designed to be programmatically callable, we make it much easier for malicious users to call our service; the barrier to entry is dropped further than ever.
But we're not just standing helpless against this wave of possible abuse. We can do a few things to mitigate and avoid disaster and certain doom.
11.5.1. Monitoring with API Keys
It's becoming common practice to require applications to register for "keys," which are usually long hexadecimal or alphanumeric strings. These keys can then be used at the server side to track application usage and see that nothing bad is going on. Aside from abuse tracking, we can use this technique to report usage statistics to application authors, a service that is often well appreciated. For distributed applications, the authors won't easily be able to collect usage statistics, but we'll know for sure how many times that application called our API.
If we see a key doing something bad, or we want to disable an application for some other reason, we can "expire" the key that application is using. By allowing keys to be disabled, we can make sure that none of the calls the application makes cost us processing overhead and instead returns an error message to explain that the key has been disabled. By clearly documenting this error code, we can encourage third party developers to build checks into their applications, so that if we do decide to disable the key, users can be told what's happening.
The problem with using keys as the sole identifying method is that for distributed applications, keys can easily be stolen. If they're in the application source code and get sent out across the wire, then they're trivial to extract. Instead of monitoring and blocking at the key level, we need to track calls at the level of key and IP address combination. If a single IP address is calling methods too often, then we can disable the IP address and key combination without affecting other legitimate users of the key.
If we're tracking all of our API calls, then we can produce reports and see what's going badly. However, this can only give us the power to deal with a situation after it goes bad. The system slows down, Apache requests are up, and so we check the API-monitoring system. We see that an IP address using a certain key has started making 10 calls a second, so we shut it down.
This is not a great situation to get into; we want to be proactive rather than reactive and avoid impact on our application at all. We can do this by throttling or rate-limiting connections. There are several ways we can do this, but there are three basic principles for easy implementation: next service slot, timed bucketing, and leaky bucketing.
In a next service slot system, every time we receive a request we log the time in a table against the IP and key combination. When another request comes in, we check the value in the table. If it hasn't been a certain amount of time since the last call, then we abort with a rate-limiting error. If the allowed time has elapsed, we replace the tabled time with our current time and service the request. This is very easy to implement using memcached with expiring objects, which allows us to avoid filling up the cache as time goes by.
In a fixed bucket scenario, we define a period of time and an acceptable number of calls the service can execute. When a request is received, we look up the entry for its IP address and key combination. This entry contains only a call count. If the call count is at or above the limit for the time period, we don't service the request. If it's below the limit, or the entry doesn't exist, we increment the count (or set it to one for new entries). When creating new entries, we set the expiry time to the time limit for the allowed number of calls. When this limit is up, the entry disappears, effectively setting the call count back to zero.
In a leaky bucket system, we keep an entry for each IP address and key combination, counting the number of calls we make. Every fixed period of time, we reduce the count of the bucket by a certain amount, making more room in the bucket (or emptying it, which deletes is). Whenever a call is made, we check to see if the bucket count is full (some maximum value); if so, the caller will have to wait until the bucket next leaks. While this can be the hardest method to implement, it allows you to set a limit such as 1,000 calls a day, but force those calls to be spread out over the day by making the maximum bucket size low and the leaking period short.
When creating rate-limiting systems it's important to build in exception mechanisms. Keys that you're using internally, or giving to partners, should be able to bypass any limits (although not bypass monitoring). Similarly, it can be useful to remove certain IPs from the limiting system: for instance, in order to allow easy development of client and server components in your development environment without ever hitting limits.
As with any component of our application, we can skip a lot of work by caching the results of queries, and an API is no exception. Of course, we can only cache read requests; we always need to perform the writes. As with other caches, we need to be careful to invalidate the data correctly, so linking the API cache with your main data caching and expiration system is a good idea.
Because of the abstraction in the input and output components of our API system, we can cache results independently of the request and response formats, storing raw data objects that we then serialize on the fly (since serialization is typically a processor-only task, and CPU power is cheap). We just need to generate a key for the call based on the method name and argument list and check the cache. If we get a hit, we serialize the response and send it without further processing. If we get a miss, we perform the request code, set up the correct invalidation hooks, store the result in the cache, and serialize the output for the response.
But it's not just caching on the server side that can help us avoid overloading the APIclients can sensibly cache a lot of data. While we have to ensure that the server side is always serving out fresh data, stale data in many applications is acceptable. Imagine someone develops a screensaver application that takes the latest new posts from our application and displays them in some fancy 3D way. The method we provide through our API should always provide the latest news itemwe have to make sure we invalidate any cache when we update the news. The screensaver, on the other hand, can show news a few hours old because it doesn't have a responsibility to any application further down the chain. The screensaver can cache all the news it uses and avoid making too many calls to our service. For applications that need to make many API calls and get used by many users, a simple bit of client caching can make a massive reduction in server side load.
But how can we build this caching into client applications when we only control the server API? Luckily we (hopefully) control the language bindings (or "kits") that developers are going to use. By building caching straight into the client libraries (and turning it on by default for easily cacheable methods), we encourage client applications to cache data. Often the reason for not caching on the client side is laziness rather than any technical reason, such as needing the freshest data. By providing zero-effort caching support, we can greatly increase the chance that applications will cache data, reducing the level of service we need to support.