Team LiB
Previous Section Next Section

Client-Side Caching of HTTP Content

When content is sent over the Internet to a client, generally a browser, that client can cache the content locally. A local cache reduces bandwidth usage and improves performance. Given that much, if not most, Internet content actually changes relatively little within the context of a single user session, caching is an invaluable tool for improving performance.

The general issue with caching is simple: change notification. Almost every Web developer has encountered the situation in which you change content on the Web server, but when you refresh, the old content appears. This is an example of the classic caching problem.

To address this problem with caching, there are two standard approaches in HTTP involving headers. The first method checks for the most recent modification time of the document, whereas the second method checks for changes in the entity tag (E-Tag) associated with the resource being requested.

Caching can also be controlled or modified using the Cache-Control and Pragma HTTP headers. These are generally used in the situation where you want to indicate that a particular document should not be cached. The Pragma header is used under HTTP 1.0 and is sent with a no-cache value to turn off caching, as shown next:

Pragma: No-cache

If you are using HTTP 1.1, you want to use Cache-Control, which replaced Pragma. The equivalent to the preceding Pragma statement is

Cache-Control: No-cache

Classic HTTP 1.0 caching is controlled using the If-Modified-Since header with GET requests. With this approach, the client instructs the server to send the data for the requested URL only if it has been modified since the time specified with the header. A status code of 200 will be sent with the document if it was modified. It the document wasn't modified, a status code of 304 (Not Modified) will be sent.

Beyond If-Modified-Since is the If-Unmodified-Since header. This header tells the server to send the data only if it hasn't been changed since the specified date.

HTTP 1.1 introduced a new approach to cache management, the E-Tag. An E-Tag is a unique identifier associated with a particular document and is computed from the document's content. If you think of an E-Tag as an MD5 hash of a document's content, you'll have a good handle on it (in fact, MD5 hashes are one of the ways that E-Tags are computed). The idea here is that if the document changes, the E-Tag will also change. This makes it easier to check only the E-Tagnot both the URL and Last-Modified date. Additionally, when you are working with dynamic documents, they often lack a Last-Modified date, giving another reason to use E-Tags for cache management.

If you are programming in PHP, you need to use the header() command to send your E-Tags, as shown next:

<?php
 $etag = md5($content);
 header("ETag: $etag");
?>

When you are programming at the client-side level, the If-Match or If-None-Match header is used to verify specific E-Tags.

    Team LiB
    Previous Section Next Section