Section 8.3. Building the Log Parser

8.3. Building the Log Parser

We are finally ready to start writing some code. The first thing we do is open our script and check whether a log filename was passed (the only mandatory argument). If not, the script dies and prints the script usage; otherwise, it continues:

#!/usr/bin/perl

use strict;

# Check for mandatory arguments or print out usage info
unless (@ARGV) { 
 die "Usage: $0 LogFile\n"; 
}

Now that we know a command-line argument was passed, we assume it was the log file name and attempt to open the file. If we cannot open the file, the script dies and prints an error message:

# Attempt to open the input file
open(IN, "<", $ARGV[0]) or die"ERROR: Can't open file $ARGV[0].\n";

Before we go any further, it is imperative that we be familiar with the structure and format of the log file we are parsing. Provided that the proxy server you are using is logging the raw HTTP requests and responses (most of them do), the logic to generate test requests from our Perl script should be virtually identical, with the exception of the delimiter used to separate each log file entry. Looking at the Burp log file shown in Example 8-6, notice that each request and response is separated with a consistent delimiter ("=" 54 x).

Example 8-6. Excerpt from Burp proxy log file

======================================================
http://www.myserver.com/192.168.0.1:80
======================================================
GET /blah.jsp HTTP/1.0
Accept: */*
Accept-Language: en-us
Pragma: no-cache
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2)
Host: www.myserver.com
Proxy-Connection: Keep-Alive


======================================================
HTTP/1.1 200 OK
Server: Apache/1.3.27 (Unix)
Date: Sun, 11 Jul 2004 17:21:01 GMT
Content-type: text/html; charset=iso-8859-1
Connection: close

<html>
  <head>
    <title>Test Page</title>
  </head>
  <body>
     <P>Hello World!</P>
  </body>
</html>
======================================================

Going back to the script, now that the file is open we place its contents into an array (@logData). We also change the default record separator ($/) to be the delimiter of our log file entries. That way, each array member in @logData is a separate log entry.

# Populate logData with contents of input file
my @logData = <IN>;

# Change the input record separator to select entire log entries
$/ = "=" x 54;

Next, we loop through each log file entry and parse the first line of the request to determine if it is a GET or a POST request:

# Loop through each request and parse it
my ($request,$logEntry, @requests);
foreach $logEntry (@logData) {

 # Create an array containing each line of the raw request
 my @logEntryLines = split(/\n/, $logEntry);

 # Create an array containing each element of the first request line
 my @requestElements = split(/ /, $logEntryLines[1]);
 
 # Only parse GET and POST requests
 if ($requestElements[0] eq "GET" || $requestElements[0] eq "POST" ) {

For GET requests, we simply parse the first two members of the @requestElements array. These two elements should consist of the method (GET) and the resource being requested. Because all spaces in the GET request must be URL-encoded, the query string (if present) should be included in the second member of the array, along with the filename or application resource name. For GET requests, we go ahead and print this string as output and follow it with a new line:

  if ($requestElements[0] eq "GET" ) {
   print "$requestElements[0]  $requestElements[1]\n";
  }

For POST requests, we need to do a bit more processing. Specifically, we parse the same two data elements we parsed for the GET requests (except here the method should be equal to POST), but we also have to parse out the POST data string from the body of the request. Based on our log file format, the POST data string should be the second-to-last data element in the @logEntryLines array (this is the array that contains each line of the specific log entry we are parsing). Then we append the POST data to the resource name as though it were a query string, and we print the line:

  # POST request data is appended after the question mark
  if ($requestElements[0] eq "POST" ) {
   print $requestElements[0]." ".$requestElements[1]."?".$logEntryLines[-2]."\n";
  }

Finally, we close our if and for statements, and the script exits:

 } # End check for GET or POST
} # End loop for input file entries

Now we can use our parseLog.pl script to print out a listing of test request data in a very simple and consistent format. The complete parseLog.pl code is included at the end of this chapter.