5.2. Self-Inflicted Attacks
Administrators often have only themselves to blame for service failure. Leaving a service configured with default installation parameters is asking for trouble. Such systems are very susceptible to DoS attacks and a simple traffic spike can imbalance them.
5.2.1. Badly Configured Apache
One thing to watch for with Apache is memory usage. Assuming Apache is running in prefork mode, each request is handled by a separate process. To serve one hundred requests at one time, a hundred processes are needed. The maximum number of processes Apache can create is controlled with the MaxClients directive, which is set to 256 by default. This default value is often used in production and that can cause problems if the server cannot cope with that many processes.
Figuring out the maximum number of Apache processes a server can accommodate is surprisingly difficult. On a Unix system, you cannot obtain precise figures on memory utilization. The best thing we can do is to use the information we have, make assumptions, and then simulate traffic to correct memory utilization issues.
Looking at the output of the ps command, we can see how much memory a single process takes (look at the RSZ column as it shows the amount of physical memory in use by a process):
# ps -A -o pid,vsz,rsz,command PID VSZ RSZ COMMAND 3587 9580 3184 /usr/local/apache/bin/httpd 3588 9580 3188 /usr/local/apache/bin/httpd 3589 9580 3188 /usr/local/apache/bin/httpd 3590 9580 3188 /usr/local/apache/bin/httpd 3591 9580 3188 /usr/local/apache/bin/httpd 3592 9580 3188 /usr/local/apache/bin/httpd
In this example, each Apache instance takes 3.2 MB. Assuming the default Apache configuration is in place, this server requires 1 GB of RAM to reach the peak capacity of serving 256 requests in parallel, and this is only assuming additional memory for CGI scripts and dynamic pages will not be required.
Do not be surprised if you see systems with very large Apache processes. Apache installations with a large number of virtual servers and complex configurations require large amounts of memory just to store the configuration data. Apache process sizes in excess of 30 MB are common.
So, suppose you are running a busy, shared hosting server with hundreds of virtual hosts, the size of each Apache process is 30 MB, and some of the sites have over 200 requests at the same time. How much memory do you need? Not as much as you may think.
Most modern operating systems (Linux included) have a feature called copy-on-write, and it is especially useful in cases like this one. When a process forks to create a new process (such as an Apache child), the kernel allocates the required amount of memory to accommodate the size of the process. However, this will be virtual memory (of which there is plenty), not physical memory (of which there is little). Memory locations of both processes will point to the same physical memory location. Only when one of the processes attempts to make changes to data will the kernel separate the two memory locations and give each process its own physical memory segment. Hence, the name copy-on-write.
As I mentioned, this works well for us. For the most part, Apache configuration data does not change during the lifetime of the server, and this allows the kernel to use one memory segment for all Apache processes.
5.2.2. Poorly Designed Web Applications
Having an application that communicates to a database on every page request, when it is not necessary to do so, can be a big problem. But it often happens with poorly written web applications. There is nothing wrong with this concept when the number of visitors is low, but the concept scales poorly.
The first bottleneck may be the maximum number of connections the database allows. Each request requires one database connection. Therefore, the database server must be configured to support as many connections as there can be web server processes. Connecting to a database can take time, which can be much better spent processing the request. Many web applications support a feature called persistent database connections. When this feature is enabled, a connection is kept opened at the end of script execution and reused when the next request comes along. The drawback is that keeping database connections open like this puts additional load on the database. Even an Apache process that does nothing but wait for the next client keeps the database connection open.
Talking to a database consumes a large amount of processor time. A large number of concurrent page requests will force the server to give all processor time to the database. However, for most sites this is not needed since the software and the database spend time delivering identical versions of the same web page. A better approach would be to save the web page to the disk after it is generated for the first time and avoid talking to the database on subsequent requests.
The most flexible approach is to perform page caching at the application level since that would allow the cached version to be deleted at the same time the page is updated (to avoid serving stale content). Doing it on any other level (using mod_cache in Apache 2, for example) would mean having to put shorter expiration times in place and would require the cache to be refreshed more often. However, mod_cache can serve as a good short-term solution since it can be applied quickly to any application.
You should never underestimate the potential mistakes made by beginning programmers. More than once I have seen web applications store images into a database and then fetch several images from the database on every page request. Such usage of the database brings a server to a crawl even for a modest amount of site traffic.
The concept of cacheability is important if you are preparing for a period of increased traffic, but it also can and should be used as a general technique to lower bandwidth consumption. It is said that content is cacheable when it is accompanied by HTTP response headers that provide information about when the content was created and how long it will remain fresh. Making content cacheable results in browsers and proxies sending fewer requests because they do not bother checking for updates of the content they know is not stale, and this results in lower bandwidth usage.
By default, Apache will do a reasonable job of making static documents cacheable. After having received a static page or an image from the web server once, a browser makes subsequent requests for the same resource conditional. It essentially says, "Send me the resource identified by the URL if it has not changed since I last requested it." Instead of returning the status 200 (OK) with the resource attached, Apache returns 304 (Not Modified) with no body.
Problems can arise when content is served through applications that are not designed with cacheability in mind. Most application servers completely disable caching under the (valid) assumption that it is better for applications not to have responses cached. This does not work well for content-serving web sites.
A good thing to do would be to use a cacheability engine to test the cacheability of an application and then talk to programmers about enhancing the application by adding support for HTTP caching.
Detailed information about caching and cacheability is available at:
5.2.3. Real-Life Client Problems
Assume you have chosen to serve a maximum of one hundred requests at any given time. After performing a few tests from the local network, you may have seen that Apache serves the requests quickly, so you think you will never reach the maximum. There are some things to watch for in real life:
Unless you perform tests beforehand, you will never know how well the server will operate under a heavy load. Many free load-testing tools exist. I recommend you download one of the tools listed at: