8.5. Log Analysis
Successful log analysis begins long before the need for it arises. It
starts with the Apache installation, when you are deciding what to
log and how. By the time something that requires log analysis
happens, you should have the information to perform it.
If you are interested in
log forensics, then Scan of the
Month 31 (http://www.honeynet.org/scans/scan31/) is the
web site you should visit. As an experiment, Ryan C. Barnett kept an
Apache proxy open for a month and recorded every transaction in
detail. It resulted in almost 300 MB of raw logs. The site includes
several analyses of the abuse techniques seen in the logs.
A complete log analysis strategy consists of the following steps:
Ensure all Apache installations are configured to log sufficient
information, prior to any incidents.
Determine all the log files where relevant information may be
located. The access log and the error log are the obvious choices,
but many other potential logs may contain useful information: the
suEXEC log, the SSL log (it's in the error log on
Apache 2), the audit log, and possibly application logs.
The access log is likely to be quite large. You should try to remove
the irrelevant entries (e.g., requests for static files) from it to
speed up processing. Watch carefully what is being removed; you do
not want important information to get lost.
In the access log, try to group requests to sessions, either using
the IP address or a session identifier if it appears in logs. Having
the unique id token in the access log helps a lot since you can
perform access log analysis much faster than you could with the full
audit log produced by mod_security. The audit
log is more suited for looking at individual requests.
Do not forget the attacker could be working from multiple IP
addresses. Attackers often perform reconnaissance from one point but
attack from another.
Log analysis is a long and tedious process. It involves looking at
large quantities of data trying to make sense out of it. Traditional
Unix tools (e.g., grep,
sed, awk, and
sort) and the command line are very good for
text processing and, therefore, are a good choice for log file
processing. But they can be difficult to use with web server logs
because such logs contain a great deal of information. The bigger
problem is that attackers often utilize evasion methods that must be
taken into account during analysis, so a special tool is required. I
have written one such tool for this book:
logscan parses log lines and allows field names
to be used with regular expressions. For example, the following will
examine the access log and list all requests whose status code is
$ logscan access_log status 500
The parameters are the name of the log file, the field name, and the
pattern to be used for comparison. By default,
logscan understands the following field names,
listed in the order in which they appear in access log entries:
logscan also attempts to counter evasion
techniques by performing the following operations against the
Decode URL-encoded characters.
Remove multiple occurrences of the slash character.
Remove self-referencing folder occurrences.
Detect null byte attacks.
You will find the following web server log forensics resources useful: