22.1. Introduction

This chapter covers Apache 2.0. Apache 1.3 is the most widely used HTTP server in the world; it's dependable, robust, and extremely well documented. In fact, it's so well documented you don't need me to rehash the same old stuff. Apache 2.0 is a significant upgrade from 1.3; architecturally, there are many changes and improvements, and it's a bit easier to configure than 1.3. This chapter covers compiling Apache from sources; hosting multiple domains; serving up pages in different languages with Content Negotiation; using dynamic shared objects (DSOs), which are analogous to loadable kernel modules; and various other Apache tasks.

This chapter does not cover scripting or writing web applications. Those are large topics that are well taught by fine books such as these (all published by O'Reilly):

Apache Cookbook, by Ken Coar and Rich Bowen

Apache: The Definitive Guide, Third Edition, by Ben Laurie and Peter Laurie

Java Servlet and JSP Cookbook, by Bruce W. Perry

JavaServer Faces, by Hans Bergsten

Perl Cookbook, Second Edition, by Tom Christiansen and Nathan Torkington

PHP Cookbook, by David Sklar and Adam Trachtenberg

Tomcat: The Definitive Guide, by Jason Brittain and Ian F. Darwin

Upgrading to PHP 5, by Adam Trachtenberg

Web Database Applications with PHP and MySQL, Second Edition, by Hugh E. Williams and David Lane

When you're planning to build a web site, the first decision is what operating system to run it on. Apache runs on Windows, Unix, OS/2, and even BeOS. Since this is a Linux book, let's assume you'll run Apache on Linux. Your remaining decisions are a little harder:

Use Apache 1.3, or Apache 2.x?
Install from sources or packages?
Self-host, or use a service provider?

22.1.1 Use Apache 1.3, or Apache 2.x?

This is one of those "too many good choices" scenarios. Apache 1.3 is rock solid, well supported, and over-abundantly documented—a rare luxury in computing. It's also endlessly extensible and customizable via add-on modules. "If it ain't broke, don't fix it" is still a good maxim, especially in computing.

On the other hand, Apache 2 is a significant departure from the architecture of Apache 1.3. It's faster, it's more efficient, and it scales up a lot better than 1.3. The downside is that 1.3 modules don't work with 2.0 without being recompiled or, in some cases, rewritten. The good news is that it's been around long enough to have a sizable number of useful modules available and ready to go to work. And it's only going to get better, as more developer energy is directed toward 2.0 and less toward 1.3.

Currently, the major remaining problem module is PHP. The maintainers of PHP warn you to not use PHP and Apache 2.0 on a production system, but by the time you read this, PHP 5 should be production-ready. Why should you care about PHP? If you plan to serve only static pages, you don't need it. However, if you want to generate dynamic content and build web applications, it's a good alternative to Perl, as it is a scripting language invented especially for web development. Learn all about it at http://us3.php.net or in Apache: The Definitive Guide.

22.1.1.1 Apache 2.0 differences

The most interesting changes to Apache 2.0 are its new multithreading architecture, which is configured using multiprocessing modules (MPMs), and a simplified configuration file. Most of the confusing and redundant directives have been removed from httpd.conf, so it's a lot easier to understand.

The default MPM is "Prefork." If you wish to try one of the others, you need to select it at compile time. These are the three MPM modes for Linux:

Prefork: The 1.3 model: A single parent process spawns child processes to handle requests. Spare children are kept sitting around just in case. Excess children are killed off after a prescribed length of time. (This is what the docs say. Really.) This is the 1.3 way of doing things. It permits using thread-unsafe libraries, so you can still use old modules that don't support multithreading.
Worker: Hybrid multiprocess and multithreads: This is a hybrid multiprocess multithreaded server. It uses threads to serve requests, and because threads use fewer system resources than processes, it can handle a larger workload. Yet it retains much of the stability of a process-based server by keeping available multiple processes, each with many threads. Because threads share memory space, programs must be written to be "thread-safe."
PerChild: A fixed number of processes spawn varying numbers of threads. This is the most scalable option. Most radically, it allows daemon processes serving requests to be assigned a variety of different user IDs, which presents some interesting possibilities for secure, high-performance virtual hosting. It is also the trickiest option. If you are an ace programmer, Apache invites you to participate in testing and developing this module.

Prefork is the default, but users running high-demand servers might be interested in testing the Worker MPM. See http://httpd.apache.org/docs-2.0/mod/worker.htmlto learn how to implement the Worker MPM.

There are also platform-specific MPMs. If you are running Apache on one of these, be sure to select the appropriate MPM for the operating system:

BeOS: beos
Netware: mpm_netware
OS/2: mpmt_os2
Windows: mpm_winnt

22.1.2 Install from Sources or Packages?

Installing from packages is quickest, if you don't mind being stuck with whatever the package maintainer decides you should have. But it is not the easiest option—all the different distributions use different filenames and package names, so the Apache documentation doesn't make sense until you figure out the differences.

Installing from sources is a bit more work: you need to manually create a startup script, create an Apache owner and group, and set all of your compile-time options, including file locations. However, you have precise control over what goes in and, equally important, what is left out. And with Apache 2.0, it's no longer necessary to recompile the binary when you wish to add or remove a module. A new feature is Dynamic Shared Objects (DSO), which are analogous to loadable kernel modules. Simply add or remove the modules as you need, without touching the httpd binary.

22.1.3 Self-Host or Use a Service Provider?

There are quite a number of hosting options to choose from. First, you can host your web server on a physically local machine, such as a machine in your home or office. This option offers convenience and control—if anything goes wrong, you're right there to deal with it. On the other hand, maintenance, security, and service are all up to you. And if your Internet connection goes down, there is no one but you to call up your upstream provider and nag them to fix it. The biggest downside is that bandwidth is expensive.

Another option is to use a commercial web-hosting service, where you pay a monthly fee for a certain amount of storage, bandwidth, and features on a shared server. This can be a nice option, if you find a quality web host. Typically, you get rafts of features: webmail, FTP, MySQL, PHP, CGI, Perl, POP/IMAP, SpamAssassin, streaming media, forum software, and more. If you plan to host more than one web site, look at CPanel reseller plans. CPanel is a web-based administration tool that is especially good for managing multiple sites. Shop carefully—the world is full of folks who get into the business without any idea of what they are getting into. Check out the Web Hosting Talk forums at http://www.webhostingtalk.com to learn about who's good and who's scammy. Don't go for the cheapest hosts—you can get good deals, but you generally get what you pay for. There is no such thing as "unlimited bandwidth," or any such nonsense.

The next option is to lease hardware and connectivity in a commercial data center and to install and maintain all the software yourself. You should see some cost savings for bandwidth, since you'll be on a shared line. A good facility will have backup power, redundant Internet connectivity, and good physical security. They will also monitor customers' bandwidth and server usage, and keep a tight rein on hogs and service abusers.

A shared server costs the least, but the disadvantages are obvious: you might not get a shell account, which means administration via a clunky web interface; and all it takes is one dunce customer to goof up the entire box by running system-hogging scripts or getting compromised. However, in a well-run data center, this can be a cost-effective solution. Look for a service provider offering User-Mode Linux (UML) hosting; this quite effectively isolates the different users from each other, and everyone gets shell accounts.

A leased, dedicated server is usually expensive, but if the lease cost includes on-site administration, it can be a good deal. Hardware maintenance is the responsibility of the data center, which may be a real hassle-saver for you.

If you want a dedicated server and don't want to share with other customers, usually the most cost-effective plan is to buy your own machine and rent rack space. Many data centers offer on-site administration and hardware support on an as-needed basis.

Beware of weirdo bandwidth billing methods. Be very clear up front how you will be charged for your bandwidth usage. One common dodge is to play games with aggregate usage: you think you're getting 1 gigabyte of data transfer per month, but the provider might have sneaky daily or even hourly "burst" limits, and penalties for exceeding them. Another dodge is vague service-level agreements. These should specify a guaranteed uptime and how quickly they will respond to an error ticket. Make sure you have a written agreement that explicitly spells out every little thing, and if there is anything you don't understand, don't sign until they clarify it to your satisfaction.

Keep in mind that the more you want, the more it will cost—there are no free rides, and definitely be suspicious of too-good-to-be-true deals.

< Day Day Up >