Section 8.1. Web Application Environment

8.1. Web Application Environment

The term web application typically implies certain attributes an application has. Most often, it means that the application is browser-basedi.e., you can access it using a standard web browser such as Internet Explorer or Netscape Navigator. For the purposes of our discussions in the next two chapters, we assume the web applications communicate using the Hypertext Transfer Protocol (HTTP) and that users access them via a web browser.

8.1.1. HTTP

Most web applications use HTTP to exchange data between the client (typically a web browser such as Internet Explorer or Netscape Navigator) and the server. HTTP works through a series of requests from the client and associated server responses back to the client. Each request is independent and results in a server response. A detailed familiarity with HTTP requests and responses is critical to effectively test web applications. Example 8-1 shows what a typical raw HTTP request looks like.

Example 8-1. Typical HTTP GET request

GET /public/content/jsp/news.jsp HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0)
Host: www.myserver.com
Connection: Keep-Alive

The first line of the HTTP request typically contains the request methodin this case, the GET methodfollowed by the file or resource being requested. The version of HTTP the client uses is also appended to the first line of the request. Following this line are various request headers and associated values.

Several HTTP request methods are defined in the HTTP RFC; however, by far the two most common are the GET and POST methods. The primary difference between these methods is in how application parameters are passed to the file or resource being requested. Requests for resources that do not include parameter data are typically made using the GET request (as shown in Example 8-1). GET requests, however, can also include parameter data in the query string portion of the request. The query string normally consists of at least one parameter name/value pair appended to the end of the resource being requested. Use a question mark (?)to separate the resource name from the query string data, and you use an equals sign (=) to separate the parameter name/value pair. You can pass multiple parameter name/value pairs in the query string and concatenate them using an ampersand (&). Example 8-2 shows the same GET request from Example 8-1, but it contains request data in the query string.

Example 8-2. HTTP GET request with query string data

GET /public/content/jsp/news.jsp?id=2&view=F HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0)
Host: www.myserver.com
Connection: Keep-Alive

The POST request method is very similar to the GET method, with the exception of how parameter name/value pairs are passed to the application. A POST request passes name/value pairs with the same syntax as that used in a GET request, but it places the data string in the body of the request after all request headers. The Content-Length header is also passed in a POST request to indicate to the HTTP server the length of the POST data string. The Content-Length header value must contain the exact number of characters in the POST data string. Example 8-3 shows the request from Example 8-2, but this time using the POST method.

Example 8-3. HTTP POST request with data

POST /public/content/jsp/news.jsp HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0)
Host: www.myserver.com
Content-Length: 11
Connection: Keep-Alive

id=2&view=F

Each HTTP request results in a response from the server. The structure of the HTTP response is somewhat similar to that of a request, consisting of the HTTP version and response code in the first line, followed by a series of response headers and values. The HTML output the browser renders is included in the body of the HTTP response following the response headers. Unlike the HTTP response headers, the HTML output is rendered to the user and can be viewed in its raw state using the View Source option in most web browsers. Example 8-4 shows a typical HTTP response.

Example 8-4. HTTP response

HTTP/1.1 200 OK
Date: Sat, 10 Jul 2004 23:45:12 GMT
Server: Apache/1.3.26 (Unix)
Cache-Control: no-store
Pragma: no-cache
Content-Type: text/html; charset=ISO-8859-1

<HTML>
<HEAD>
<TITLE>My News Story</TITLE>
</HEAD>
<BODY>
<H1>My News Story</H1>
<P>This is a simple news story.</P>
</BODY>
</HTML>

The response status code consists of a three-digit number returned in the first line of the HTTP response. An HTTP server can return several status codes, all classified based on the first of the three digits. Table 8-1 shows a breakout of the five general status code categories.

Table 8-1. HTTP response codes

Status code

Category

1XX (i.e., 100 Continue)

Informational

2XX (i.e., 200 OK)

Success

3XX (i.e., 302 Object Moved)

Redirection

4XX (i.e., 404 File Not Found)

Client Error

5XX (i.e., 500 Internal Server Error)

Server Error

8.1.2. SSL

You can use Secure Sockets Layer (SSL) to encrypt the communications channel between the web browser client and server. Although this is usually referred to as HTTPS, underneath the encryption the HTTP requests and responses still look the same. Many people think that simply because HTTPS is used, the application or server is "secure" and resilient to attack. It is important to realize that SSL merely protects the request and response data while in transit so that someone eavesdropping on the network or otherwise intercepting the data cannot read it. The underlying data and associated application, however, are still susceptible to end-user attack.

Common SSL Misconceptions

The web server is secure because SSL is used.
SSL secures the web application.
HTTP exploits do not work over SSL.

8.1.3. Perl and LWP

We will use the Perl scripting language to develop the web application scanner outlined in this chapter. Perl's extensive support of regular expressions and platform independence makes it a great language with which to develop our scanner. We have kept the code syntax as straightforward and easy-to-follow as possible, and we will explain each block of code as we develop it. We will use the Libwww-perl user agent module (LWP::UserAgent) native to many Perl installations. LWP is essentially a WWW client library that allows you to easily make HTTP requests from a Perl script. If you want to learn more about LWP, read Perl and LWP, by Sean Burke (O'Reilly).

Got LWP?

If you're not sure whether LWP is included in your PERL installation, use the following command to check:

% perl -MLWP -le "print(LWP->VERSION)"

If LWP is not already installed, you should obtain and install the most recent version from the Comprehensive Perl Archive Network (CPAN). Use the following commands to install LWP using CPAN:

% perl -MCPAN -eshell cpan> install Bundle::LWP

Another nice thing about LWP is that it supports HTTP requests over SSL as long as the Crypt::SSLeay Perl module and OpenSSL libraries are installed. If you want to use the scanner on HTTPS web applications, ensure that the Crypt::SSLeay module and OpenSSL libraries are installed and working.

8.1.4. Web Application Vulnerabilities

When we use the term web application vulnerabilities , we are referring to a vulnerability that is the result of poorly written application code. These vulnerabilities can range from application components that do not properly validate external input before processing (such as SQL injection), to flaws in the code that do not properly authenticate users before allowing access. The nature and classifications of web application vulnerabilities are outside the scope of this chapter, but we give a quick overview of these vulnerabilities in the sidebar Open Web Application Security Project.

Open Web Application Security Project

If you are not familiar with web application vulnerabilities, the Open Web Application Security Project (www.owasp.org) is a great resource that can bring you up to speed. OWASP has developed a Top Ten List of the most critical web application vulnerabilities. The list is not all-inclusive, but it represents many of the critical issues present in web-based applications.

Top Ten Most Critical Web Application Vulnerabilities 2004

Unvalidated input

Broken access control

Broken authentication and session management

Cross-Site Scripting (XSS)

Buffer overflows

Injection flaws

Improper error handling

Insecure data storage

Denial of service

Insecure configurations management