Team LiB
Previous Section Next Section

5.2. Self-Inflicted Attacks

Administrators often have only themselves to blame for service failure. Leaving a service configured with default installation parameters is asking for trouble. Such systems are very susceptible to DoS attacks and a simple traffic spike can imbalance them.

5.2.1. Badly Configured Apache

One thing to watch for with Apache is memory usage. Assuming Apache is running in prefork mode, each request is handled by a separate process. To serve one hundred requests at one time, a hundred processes are needed. The maximum number of processes Apache can create is controlled with the MaxClients directive, which is set to 256 by default. This default value is often used in production and that can cause problems if the server cannot cope with that many processes.

Figuring out the maximum number of Apache processes a server can accommodate is surprisingly difficult. On a Unix system, you cannot obtain precise figures on memory utilization. The best thing we can do is to use the information we have, make assumptions, and then simulate traffic to correct memory utilization issues.

Looking at the output of the ps command, we can see how much memory a single process takes (look at the RSZ column as it shows the amount of physical memory in use by a process):

# ps -A -o pid,vsz,rsz,command
 3587  9580 3184 /usr/local/apache/bin/httpd
 3588  9580 3188 /usr/local/apache/bin/httpd
 3589  9580 3188 /usr/local/apache/bin/httpd
 3590  9580 3188 /usr/local/apache/bin/httpd
 3591  9580 3188 /usr/local/apache/bin/httpd
 3592  9580 3188 /usr/local/apache/bin/httpd

In this example, each Apache instance takes 3.2 MB. Assuming the default Apache configuration is in place, this server requires 1 GB of RAM to reach the peak capacity of serving 256 requests in parallel, and this is only assuming additional memory for CGI scripts and dynamic pages will not be required.

Most web servers do not operate at the edge of their capacity. Your initial goal is to limit the number of processes to prevent server crashes. If you set the maximum number of processes to a value that does not make full use of the available memory, you can always change it later when the need for more processes appears.

Do not be surprised if you see systems with very large Apache processes. Apache installations with a large number of virtual servers and complex configurations require large amounts of memory just to store the configuration data. Apache process sizes in excess of 30 MB are common.

So, suppose you are running a busy, shared hosting server with hundreds of virtual hosts, the size of each Apache process is 30 MB, and some of the sites have over 200 requests at the same time. How much memory do you need? Not as much as you may think.

Most modern operating systems (Linux included) have a feature called copy-on-write, and it is especially useful in cases like this one. When a process forks to create a new process (such as an Apache child), the kernel allocates the required amount of memory to accommodate the size of the process. However, this will be virtual memory (of which there is plenty), not physical memory (of which there is little). Memory locations of both processes will point to the same physical memory location. Only when one of the processes attempts to make changes to data will the kernel separate the two memory locations and give each process its own physical memory segment. Hence, the name copy-on-write.

As I mentioned, this works well for us. For the most part, Apache configuration data does not change during the lifetime of the server, and this allows the kernel to use one memory segment for all Apache processes.

If you have many virtual servers do not put unnecessary configuration directives into the body of the main server. Virtual servers inherit configuration data from the main server, making the Apache processes larger.

5.2.2. Poorly Designed Web Applications

Having an application that communicates to a database on every page request, when it is not necessary to do so, can be a big problem. But it often happens with poorly written web applications. There is nothing wrong with this concept when the number of visitors is low, but the concept scales poorly.

The first bottleneck may be the maximum number of connections the database allows. Each request requires one database connection. Therefore, the database server must be configured to support as many connections as there can be web server processes. Connecting to a database can take time, which can be much better spent processing the request. Many web applications support a feature called persistent database connections. When this feature is enabled, a connection is kept opened at the end of script execution and reused when the next request comes along. The drawback is that keeping database connections open like this puts additional load on the database. Even an Apache process that does nothing but wait for the next client keeps the database connection open.

Unlike for most database servers, establishing a connection with MySQL server is quick. It may be possible to turn persistent connections off in software (e.g., the PHP engine) and create connections on every page hit, which will reduce the maximum number of concurrent connections in the database.

Talking to a database consumes a large amount of processor time. A large number of concurrent page requests will force the server to give all processor time to the database. However, for most sites this is not needed since the software and the database spend time delivering identical versions of the same web page. A better approach would be to save the web page to the disk after it is generated for the first time and avoid talking to the database on subsequent requests.

The most flexible approach is to perform page caching at the application level since that would allow the cached version to be deleted at the same time the page is updated (to avoid serving stale content). Doing it on any other level (using mod_cache in Apache 2, for example) would mean having to put shorter expiration times in place and would require the cache to be refreshed more often. However, mod_cache can serve as a good short-term solution since it can be applied quickly to any application.

You should never underestimate the potential mistakes made by beginning programmers. More than once I have seen web applications store images into a database and then fetch several images from the database on every page request. Such usage of the database brings a server to a crawl even for a modest amount of site traffic.

The concept of cacheability is important if you are preparing for a period of increased traffic, but it also can and should be used as a general technique to lower bandwidth consumption. It is said that content is cacheable when it is accompanied by HTTP response headers that provide information about when the content was created and how long it will remain fresh. Making content cacheable results in browsers and proxies sending fewer requests because they do not bother checking for updates of the content they know is not stale, and this results in lower bandwidth usage.

By default, Apache will do a reasonable job of making static documents cacheable. After having received a static page or an image from the web server once, a browser makes subsequent requests for the same resource conditional. It essentially says, "Send me the resource identified by the URL if it has not changed since I last requested it." Instead of returning the status 200 (OK) with the resource attached, Apache returns 304 (Not Modified) with no body.

Problems can arise when content is served through applications that are not designed with cacheability in mind. Most application servers completely disable caching under the (valid) assumption that it is better for applications not to have responses cached. This does not work well for content-serving web sites.

A good thing to do would be to use a cacheability engine to test the cacheability of an application and then talk to programmers about enhancing the application by adding support for HTTP caching.

Detailed information about caching and cacheability is available at:

5.2.3. Real-Life Client Problems

Assume you have chosen to serve a maximum of one hundred requests at any given time. After performing a few tests from the local network, you may have seen that Apache serves the requests quickly, so you think you will never reach the maximum. There are some things to watch for in real life:

Slow clients

Measuring the speed of request serving from the local network can be deceptive. Real clients will come from various speeds, with many of them using slow modems. Apache will be ready to serve the request fast but clients will not be ready to receive. A 20-KB page, assuming the client uses a modem running at maximum speed without any other bottlenecks (a brave assumption), can take over six seconds to serve. During this period, one Apache process will not be able to do anything else.

Large files

Large files take longer to download than small files. If you make a set of large files available for download, you need to be aware that Apache will use one process for each file being downloaded. Worse than that, users can have special download software packages (known as download accelerators), which open multiple download requests for the same file. However, for most users, the bottleneck is their network connection, so these additional download requests have no impact on the download speed. Their network connection is already used up.

Keep-Alive functionality

Keep-Alive is an HTTP protocol feature that allows clients to remain connected to the server between requests. The idea is to avoid having to re-establish TCP/IP connections with every request. Most web site users are slow with their requests, so the time Apache waits, before realizing the next request is not coming, is time wasted. The timeout is set to 15 seconds by default, which means 15 seconds for one process to do nothing. You can keep this feature enabled until you reach the maximum capacity of the server. If that happens you can turn it off or reduce the timeout Apache uses to wait for the next request to come. Newer Apache versions are likely to be improved to allow an Apache process to serve some other client while it is waiting for a Keep-Alive client to come up with another request.

Unless you perform tests beforehand, you will never know how well the server will operate under a heavy load. Many free load-testing tools exist. I recommend you download one of the tools listed at:

"Web Site Test Tools and Site Management Tools," maintained by Rick Hower (
    Team LiB
    Previous Section Next Section