Team LiB
Previous Section Next Section

8.2. Log Manipulation

Apache does a good job with log format definition, but some features are missing, such as log rotation and log compression. Some reasons given for their absence are technical, and some are political:

  • Apache usually starts as root, opens the log files, and proceeds to create child processes. Child processes inherit log file descriptors at birth; because of different permission settings, they would otherwise be unable to write to the logs. If Apache were to rotate the log files, it would have to create new file descriptors, and a mechanism would have to exist for children to "reopen" the logs.

  • Some of the Apache developers believe that a web server should be designed to serve web pages, and should not concern itself with tasks such as log rotation.

Of course, nothing prevents third-party modules from implementing any kind of logging functionality, including rotation. After all, the default logging is done through a module ( mod_log_config) without special privileges. However, at the time of this writing no modules exist that log to files and support rotation. There has been some work done on porting Cronolog (see Section 8.2.2.2 in the Section 8.2.2 section) to work as a module, but the beta version available on the web site has not been updated recently.

8.2.1. Piped Logging

Piped logging is a mechanism used to offload log manipulation from Apache and onto external programs. Instead of giving a configuration directive the name of the log file, you give it the name of a program that will handle logs in real time. A pipe character is used to specify this mode of operation:

CustomLog "|/usr/local/apache/bin/piped.pl /var/www/logs/piped_log" combined

All logging directives mentioned so far support piped logging. Many third-party modules also try to support this way of logging.

External programs used this way are started by the web server and restarted later if they die. They are started early, while Apache is still running as root, so they are running as root, too. Bugs in these programs can have significant security consequences. If you intend to experiment with piped logging, you will find the following proof-of-concept Perl program helpful to get you started:

#!/usr/bin/perl
   
use IO::Handle;
   
# check input parameters
if ((!@ARGV)||($#ARGV != 0)) {
    print "Usage: piped.pl <log filename>\n";
    exit;
}
   
# open the log file for appending, configuring
# autoflush to avoid potential data loss
$logfile = shift(@ARGV);
open(LOGFILE, ">>$logfile") || die "Failed to open $logfile for writing";
LOGFILE->autoflush(1);
   
# handle log entries until the end
while (my $logline = <STDIN>) {
    print LOGFILE $logline;
}
   
close(LOGFILE);

If you prefer C to Perl, every Apache distribution comes with C-based piped logging programs in the support/ folder. Use these programs for skeleton source code.

Though the piped logging functionality serves the purpose of off-loading the logging task to an external program, it has some drawbacks:

  • It increases the complexity of the system since Apache must control external processes.

  • One process is created for every piped logging instance configured in the configuration. This makes piped logging impractical for virtual hosting systems where there are hundreds, or possibly thousands, of different hosts.

  • The external programs run as the user that has started the web server, typically root. This makes the logging code a big liability. Special care must be taken to avoid buffer overflows that would lead to exploitation.

8.2.2. Log Rotation

Because no one has unlimited storage space available, logs must be rotated on a regular basis. No matter how large your hard disk, if you do not implement log rotation, your log files will fill the partition.

Log rotation is also very important to ensure no loss of data. Log data loss is one of those things you only notice when you need the data, and then it is too late.

There are two ways to handle log rotation:

  • Write a script to periodically rotate logs.

  • Use piped logging and external helper binaries to rotate logs in real time.

8.2.2.1 Periodic rotation

The correct procedure to rotate a log from a script is:

  1. Move the log file to another location.

  2. Gracefully restart Apache.

  3. Wait a long time.

  4. Continue to manipulate (e.g., compress) the moved log file.

Here is the same procedure given in a shell script, with the added logic to keep several previous log files at the same location:

#!/bin/sh
   
cd /var/www/logs
mv access_log.3.gz access_log.4.gz
mv access_log.2.gz access_log.3.gz
mv access_log.1.gz access_log.2.gz
mv access_log accesss_log.1
/usr/local/apache/bin/apachectl graceful
sleep 600
gzip access_log.1

Without the use of piped logging, there is no way to get around restarting the server; it has to be done for it to re-open the log files. A graceful restart (that's when Apache patiently waits for a child to finish with the request it is processing before it shuts it down) is recommended because it does not interrupt request processing. But with a graceful restart, the wait in step 3 becomes somewhat tricky. An Apache process doing its best to serve a client may hang around for a long time, especially when the client is slow and the operation is long (e.g., a file download). If you proceed to step 4 too soon, some requests may never be logged. A waiting time of at least 10 minutes is recommended.

Never attempt to manipulate the log file without restarting the server first. A frequent (incorrect) approach to log rotation is to copy the file and then delete the original. The problem with this (on Unix systems) is the file will not be completely deleted until all open programs stop writing to it. In effect, the Apache processes will continue to log to the same (but now invisible) file. The invisible file will be deleted the next time Apache is shut down or restarted, but all the data logged since the "deletion" and until then will be lost. The purpose of the server restart, therefore, is to get Apache to let go of the old file and open a new file at the defined location.


Many Linux distributions come with a utility called logrotate, which can be used to rotate all log files on a machine. This handy program takes care of most of the boring work. To apply the Apache log rotation principles to logrotate, place the configuration code given below into a file /etc/logrotate.d/apache and replace /var/www/logs/* with the location of your log files, if different:

/var/www/logs/* {
    # rotate monthly
    monthly
   
    # keep nine copies of the log
    rotate 9
   
    # compress logs, but with a delay of one rotation cycle
    compress
    delaycompress
   
    # restart the web server only once, not for
    # every log file separately
    sharedscripts
   
    # gracefully restart Apache after rotation
    postrotate
        /usr/local/apache/bin/apachectl graceful > /dev/null 2> /dev/null
    endscript
}

Use logrotate with the -d switch to make it tell you what it wants to do to log files without doing it. This is a very handy tool to verify logging is configured properly.

8.2.2.2 Real-time rotation

The rotatelogs utility shipped with Apache uses piped logging and rotates the file after a specified time period (given in seconds) elapses:

CustomLog "|/usr/local/apache/bin/rotatelogs /var/www/logs/access_log 
300" custom

The above rotates the log every five minutes. The rotatelogs utility appends the system time (in seconds) to the log name to keep filenames unique. For the configuration directive given above, you will get filenames such as these:

access_log.1089207300
access_log.1089207600
access_log.1089207900
...

Alternatively, you can use strftime-compatible (see man strftime) format strings to create a custom log filename format. The following is an example of automatic daily log rotation:

CustomLog "|/usr/local/apache/bin/rotatelogs \
/var/www/logs/access_log.%Y%m%d 86400" custom

Similar to rotatelogs, Cronolog (http://cronolog.org) has the same purpose and additional functionality. It is especially useful because it can be configured to keep a symbolic link to the latest copy of the logs. This allows you to find the logs quickly without having to know what time it is.

CustomLog "|/usr/local/apache/bin/cronolog \
/var/www/logs/access_log.%Y%m%d --link=/var/www/logs/access_log" custom

A different approach is used in Cronolog to determine when to rotate. There is no need to specify the time period. Instead, Cronolog rotates the logs when the filename changes. Therefore, it is up to you to design the file format, and Cronolog will do the rest.

8.2.3. Issues with Log Distribution

There are two schools of thought regarding Apache log configurations. One is to use the CustomLog and ErrorLog directives in each virtual host container, which creates two files per each virtual host. This is a commonsense approach that works well but has two drawbacks:

  • It does not scale well

  • Two files per virtual host on a server hosting a thousand web sites equals two thousand file descriptors. As the number of sites grows, you will hit the file descriptor limit imposed on Apache by the operating system (use ulimit -a to find the default value). Even when the file descriptor issue is left aside, Apache itself does not scale well over a thousand hosts, so methods of shared hosting that do not employ virtual hosts must be used. This problem was covered in detail in Chapter 6.

  • Logs are not centralized

  • Performing log postprocessing is difficult (for security, or billing purposes) when you do not have logging information in a single file.

To overcome these problems, the second school of thought regarding configuration was formed. The idea is to have only two files for all virtual hosts and to split the logs (creating one file per virtual host) once a day. Log post-processing can be performed just before the splitting. This is where the vcombined access log format comes into play. The first field on the log line, the hostname, is used to determine to which virtual host the entry belongs. But the problem is the format of the error log is fixed; Apache does not allow its format to be customized, and we have no way of knowing to which host an entry belongs.

One way to overcome this problem is to patch Apache to put a hostname at the beginning of every error log entry. One such patch is available for download from the Glue Logic web site (http://www.gluelogic.com/code/apache/). Apache 2 offers facilities to third-party modules to get access to the error log so I have written a custom module, mod_globalerror, to achieve the same functionality. (Download it from http://www.apachesecurity.net/.)

    Team LiB
    Previous Section Next Section