Tweaking Apache for Higher Loads

The number of simultaneous connections allowed to Apache is defined by the MaxClients directive. By default, Apache on Fedora Core has a compile-time MaxClients hard limit of 256. However, some high-volume sites may need more simultaneous connections than this. To achieve simultaneous connections of more than 256, you must recompile Apache with special options.

Note

On heavily used web servers, professional web administrators at Rackspace often help customers set MaxClients to 1,024 or even 2,048. However, even the most high-volume web servers rarely see more than 1,500 simultaneous sessions. Note that 1,500 simultaneous sessions, with a 2 minute average session time, would yield over one million sessions per day! That's a busy web server indeed.

Even if you leave MaxClients set at 256 and expect a 2 minute average session time you still have a maximum sessions capacity of 176,000 sessions per day. Remember that content type greatly affects the average session duration, and thus affects the capacity of your server due to its simultaneous session limit. This setting is dictated by your content type. If you offer only text files or file downloads, your session duration will typically be much shorter than if you host an online bank, stock brokerage, or interactive forums, where your average session time may be up to 30 minutes. This is why there are so many administrator-adjustable variables in httpd.conf. Remember, before you make such adjustments, you need to have a good feel for how your customers behave in relation to your web content.

Tip

Ever wondered how the real webspace pros actively develop, monitor, and tweak their website content and server settings? Getting a feel for your average client session time, entry and exit pages, and referring sites can be tough if you've never tried to get your mind around these numbers before. Making content changes (banner ads, catalog entry points, and so on) and server adjustments (such as MaxClients, Timeout, and KeepAlive) without knowing this critical information is like taking potshots in the dark. Web log reporting suites such as Urchin, Webalizer, and Webtrends can help you make such determinations. On the basis this information you can make informed content and server changes and measure the results of your changes.

Server Loads and Hardware Requirements

The more sessions your sites generate per day, the more simultaneous daemon processes are required. In turn, daemon processes require RAM. If you don't have enough RAM, your server dips into swap space-the temporary overflow storage space for RAM on the hard drive. When a server must "dip into swap," the overall machine speed drops dramatically as the hard drive's speed is usually an order of magnitude slower than the RAM's native speed. In short, if you want a fast web server, always add RAM before adding processor speed. If you serve only static content, the overall machine performance of even a simple Fedora Core/Apache 1GHz server with only 256 or 512 MB of RAM and a 100 Mbps network connection can surprise you with its speed.

Dual-processor machines are rarely required for web servers unless the hosted sites require a lot of processor-intensive dynamic content operations, such as those found on a LAMP installation (Linux, Apache, MySQL, and PHP). While vast amounts of RAM are key for static-only content sessions, those who work with database access and dynamic content should do some system load testing, and based on the results consider upgrading to multiprocessor systems.

Tip

If you think you might eventually provide PostgreSQL, MySQL, Oracle, or other database access to your users, or you plan to move toward dynamic content based on Perl, Python, or PHP, you should consider future hardware purchases in that light. Even if you're serving only static content right now, go ahead and purchase or lease a multiprocessor-capable or SM- based system. Just don't fully populate all the processors and RAM if initial cost is an issue. In this way, you'll have an in place upgrade option in place to serve for your future growth, but won't take a huge hit to your bank account. Another natural benefit of this strategy is that SMP-based systems usually also have higher quality server grade components, as well as more RAM expansion options.

Setting up an Apache or FTP-based file share? Planning on running streaming multimedia web content? When the question of new hardware comes up, you may find yourself in an age-old discussion: Which is better-SCSI or ATA/IDE? The general wisdom holds that SCSI is faster because of the dedicated processing power of controllers on both the hard drive and the SCSI card, and the built-in speed advantages of command queuing, but that the cost of this speed is often prohibitive.

Not necessarily the case! That is, SCSI hardware is still expensive, but it's not always faster than ATA/IDE hardware any more. Now that ATA drives use Ultra-DMA bus transfers and 33/66/133 MHz instead of the old Processor I/O mode-and can do so without tying up other system buses or the processor itself-most UDMA-ATA systems can now match all but the highest-end SCSI and SCSI/RAID systems. Combined with the newer multibus ATA RAID cards (see www.3ware.com) and newer Serial-ATA hardware the ATA/SATA is quickly becoming the preferred high-speed low-cost solution. In some high-speed server configurations, Rackspace has actually achieved higher overall Linux application speed on a server-grade ATA/UDMA-based system than identically configured servers running on SCSI. In addition, with the newer SATA II specification drives with command queuing, this hardware platform is positioned to compete directly with high-end SCSI systems and outpace them in raw throughput by 2005, while undercutting the price of comparably configured SCSI-based systems.

Bottom line: cost isn't everything. Check the performance ratings for all the hardware that suits your needs. You may be surprised!

Benchmarking

If you're interested in web server performance tuning, you will need a way to track the effects of changes you make to the server daemon as well as your content. Apache provides its own benchmarking tool, called ab or Apache Benchmark. It's quite easy to use.

Tip

Unless you're just stress testing your hardware, use Apache Benchmark remotely to get "total throughput" testing. This way you'll be able to identify not only hardware/configuration limitations, but also bottlenecks in your network and provider connectivity issues as well. Whether you test an intranet server from a client on your LAN or an Internet server from a remote client located across the Internet, strive for system testing that simulates real-world hardware and infrastructure load testing as much as possible.

In the following sample session we invoke Apache Benchmark to test a web server across the network from a local Linux machine's command line:

   # ab -d -t 10   http://www.example.com/

The -d flag tells Apache Benchmark to deliver less verbose output. The -t flag, combined with the number 10, tells the program to test the server (pull down as many pages as it can) for 10 seconds. The last option is the URL of the site to be tested, and it gives the following output:

   This is ApacheBench, Version 1.3d <$Revision: 1.67 $> apache-1.3
   Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
   Copyright (c) 1998-2002 The Apache Software Foundation, http://www.apache.org/

   Benchmarking www.example.com (be patient)
   Finished 3150 requests
   Server Software:        Apache/2.0.46
   Server Hostname:        www.example.com
   Server Port:            80

   Document Path:          /
   Document Length:        35 bytes
   Concurrency Level :     1
   Time taken for tests:   10.000 seconds
   Complete requests:      3150
   Failed requests:        0
   Broken pipe errors:     0
   Total transferred:      942149 bytes
   HTML transferred:       110285 bytes
   Requests per second:    315.00 [#/sec] (mean)
   Time per request:       3.17 [ms] (mean)
   Time per request:       3.17 [ms] (mean, across all concurrent requests)
   Transfer rate:          94.21 [Kbytes/sec] received

   Connnection Times (ms)
                 min  mean[+/-sd] median   max
   Connect:        0     0    0.0      0     0
   Processing:     2     2    4.4      2   135
   Waiting:        2     2    4.3      2   134
   Total:          2     2    4.4      2   135

The output, by default, is printed to the local terminal. However, there's a better way to do this. Tell ab to run its test and format the output in HTML with the -w flag and then scp the results back to the web server being tested:

   # ab -d -t 10 -w http://www.example.com/ >x
   # scp x bob@example.com:/home/bob/web/html/test-output.html
   Finished 3464 requests
   bob@es.playground.crudnet.org's password:
   x        100% | *****************************]   2189
   00:00

Now, you can share the results with anyone who's interested by sending them to http://www.example.com/test-output.html:

   ...
       Requests per second:   348.47
   ...

Although these results are impressive, remember that your results will vary a great deal depending on the site's location in the network, network congestion, routing, server load levels, entry page size, static or dynamic content, and all the other variables that can affect web servers. The important thing is to use this tool for baseline data. It can help you identify misconfigurations, and give you something to work against for future comparisons.

After you get a baseline, you can use the information to adjust your server performance. Some changes are easier than others, such as adjusting the settings for MaxClients and SpareServers in httpd.conf.

In other cases, you may need such testing results to encourage your site and content designers to minimize the entry page size, including image size, or make other changes to site content.

Note

At the time we wrote this book, Fedora Core 1 CDs shipped with the Apache package httpd-2.0.47-10, which included the ab version 2.0.40-dev. This developmental version may not work properly for you. If this is the case and you have not run up2date yet, do so and the replacement version of httpd-2.0.48-1.2 should fix the problem for you. This can be done from the command line by running up2datehttpd to upgrade and fix just Apache/ab, or up2date to update the whole system.

Using Server-Status to Track Performance

Apache Benchmark is good as an external testing, website hammering, baselining and bottleneck testing tool. However, if benchmarking doesn't give you all the information you need about the inner workings of the httpd service itself, consider using the server-status module (or mod_status.so) that comes with Apache. This will show you what's going on inside Apache. It will show you how many httpd processes are running, your CPU usage, httpd server uptime, traffic levels, and other useful server side information. To set up server-status, uncomment the following line in httpd.conf:

   ExtendedStatus On

You must also uncomment this section of httpd.conf:

   <Location /server-status)
       SetHandler server-status
       Order deny,allow
       Deny from all
       Allow from 192.168.127.
   </Location>

Be sure to enter your IP or network address in the next-to-last line. Then only those IP addresses or networks will be able to access data from server-status. Now, restart the Apache daemon with the /etc/init.d/httpd restart command. When it restarts, you can open this URL in your browser:

http://example.com/server-status?refresh

This server-status page will show you real-time data about your server, autorefreshed regularly. By using this page, combined with occasional stress testing and baselining using Apache Benchmark, and making small adjustments to settings in httpd.conf and your website content, you will be well on your way to having a real understanding of what's going on with your web server. You'll have a working knowledge of what your server's bottlenecks are, and most importantly, will be able to react to and customize your Apache server when the need arises.

Tip

The autorefresh setting may cause some artificial inflation to your server-status output. If you just want to call up data as you need it, rather than running server-status constantly, issue the URL without the ?refresh component.