8.1. Web Application Environment
The term web application typically implies certain attributes an application has. Most often, it means that the application is browser-basedi.e., you can access it using a standard web browser such as Internet Explorer or Netscape Navigator. For the purposes of our discussions in the next two chapters, we assume the web applications communicate using the Hypertext Transfer Protocol (HTTP) and that users access them via a web browser.
Most web applications use HTTP to exchange data between the client (typically a web browser such as Internet Explorer or Netscape Navigator) and the server. HTTP works through a series of requests from the client and associated server responses back to the client. Each request is independent and results in a server response. A detailed familiarity with HTTP requests and responses is critical to effectively test web applications. Example 8-1 shows what a typical raw HTTP request looks like.
Example 8-1. Typical HTTP GET request
GET /public/content/jsp/news.jsp HTTP/1.1 Accept: image/gif, image/x-xbitmap, image/jpeg, */* Accept-Language: en-us Accept-Encoding: gzip, deflate User-Agent: Mozilla/4.0 (compatible; MSIE 6.0) Host: www.myserver.com Connection: Keep-Alive
The first line of the HTTP request typically contains the request methodin this case, the GET methodfollowed by the file or resource being requested. The version of HTTP the client uses is also appended to the first line of the request. Following this line are various request headers and associated values.
Several HTTP request methods are defined in the HTTP RFC; however, by far the two most common are the GET and POST methods. The primary difference between these methods is in how application parameters are passed to the file or resource being requested. Requests for resources that do not include parameter data are typically made using the GET request (as shown in Example 8-1). GET requests, however, can also include parameter data in the query string portion of the request. The query string normally consists of at least one parameter name/value pair appended to the end of the resource being requested. Use a question mark (?)to separate the resource name from the query string data, and you use an equals sign (=) to separate the parameter name/value pair. You can pass multiple parameter name/value pairs in the query string and concatenate them using an ampersand (&). Example 8-2 shows the same GET request from Example 8-1, but it contains request data in the query string.
Example 8-2. HTTP GET request with query string data
GET /public/content/jsp/news.jsp?id=2&view=F HTTP/1.1 Accept: image/gif, image/x-xbitmap, image/jpeg, */* Accept-Language: en-us Accept-Encoding: gzip, deflate User-Agent: Mozilla/4.0 (compatible; MSIE 6.0) Host: www.myserver.com Connection: Keep-Alive
The POST request method is very similar to the GET method, with the exception of how parameter name/value pairs are passed to the application. A POST request passes name/value pairs with the same syntax as that used in a GET request, but it places the data string in the body of the request after all request headers. The Content-Length header is also passed in a POST request to indicate to the HTTP server the length of the POST data string. The Content-Length header value must contain the exact number of characters in the POST data string. Example 8-3 shows the request from Example 8-2, but this time using the POST method.
Example 8-3. HTTP POST request with data
POST /public/content/jsp/news.jsp HTTP/1.1 Accept: image/gif, image/x-xbitmap, image/jpeg, */* Accept-Language: en-us Accept-Encoding: gzip, deflate User-Agent: Mozilla/4.0 (compatible; MSIE 6.0) Host: www.myserver.com Content-Length: 11 Connection: Keep-Alive id=2&view=F
Each HTTP request results in a response from the server. The structure of the HTTP response is somewhat similar to that of a request, consisting of the HTTP version and response code in the first line, followed by a series of response headers and values. The HTML output the browser renders is included in the body of the HTTP response following the response headers. Unlike the HTTP response headers, the HTML output is rendered to the user and can be viewed in its raw state using the View Source option in most web browsers. Example 8-4 shows a typical HTTP response.
Example 8-4. HTTP response
HTTP/1.1 200 OK Date: Sat, 10 Jul 2004 23:45:12 GMT Server: Apache/1.3.26 (Unix) Cache-Control: no-store Pragma: no-cache Content-Type: text/html; charset=ISO-8859-1 <HTML> <HEAD> <TITLE>My News Story</TITLE> </HEAD> <BODY> <H1>My News Story</H1> <P>This is a simple news story.</P> </BODY> </HTML>
The response status code consists of a three-digit number returned in the first line of the HTTP response. An HTTP server can return several status codes, all classified based on the first of the three digits. Table 8-1 shows a breakout of the five general status code categories.
You can use Secure Sockets Layer (SSL) to encrypt the communications channel between the web browser client and server. Although this is usually referred to as HTTPS, underneath the encryption the HTTP requests and responses still look the same. Many people think that simply because HTTPS is used, the application or server is "secure" and resilient to attack. It is important to realize that SSL merely protects the request and response data while in transit so that someone eavesdropping on the network or otherwise intercepting the data cannot read it. The underlying data and associated application, however, are still susceptible to end-user attack.
8.1.3. Perl and LWP
We will use the Perl scripting language to develop the web application scanner outlined in this chapter. Perl's extensive support of regular expressions and platform independence makes it a great language with which to develop our scanner. We have kept the code syntax as straightforward and easy-to-follow as possible, and we will explain each block of code as we develop it. We will use the Libwww-perl user agent module (LWP::UserAgent) native to many Perl installations. LWP is essentially a WWW client library that allows you to easily make HTTP requests from a Perl script. If you want to learn more about LWP, read Perl and LWP, by Sean Burke (O'Reilly).
Another nice thing about LWP is that it supports HTTP requests over SSL as long as the Crypt::SSLeay Perl module and OpenSSL libraries are installed. If you want to use the scanner on HTTPS web applications, ensure that the Crypt::SSLeay module and OpenSSL libraries are installed and working.
8.1.4. Web Application Vulnerabilities
When we use the term web application vulnerabilities , we are referring to a vulnerability that is the result of poorly written application code. These vulnerabilities can range from application components that do not properly validate external input before processing (such as SQL injection), to flaws in the code that do not properly authenticate users before allowing access. The nature and classifications of web application vulnerabilities are outside the scope of this chapter, but we give a quick overview of these vulnerabilities in the sidebar Open Web Application Security Project.