Team LiB
Previous Section Next Section

10.5. File Disclosure

File disclosure refers to the case when someone manages to download a file that would otherwise remain hidden or require special authorization.

10.5.1. Path Traversal

Path traversal occurs when directory backreferences are used in a path to gain access to the parent folder of a subfolder. If the software running on a server fails to resolve backreferences, it may also fail to detect an attempt to access files stored outside the web server tree. This flaw is known as path traversal or directory traversal. It can exist in a web server (though most web servers have fixed these problems) or in application code. Programmers often make this mistake.

If it is a web server flaw, an attacker only needs to ask for a file she knows is there:

Even when she doesn't know where the document root is, she can simply increase the number of backreferences until she finds it.

Apache 1 will always respond with a 404 response code to any request that contains a URL-encoded slash (%2F) in the filename even when the specified file exists on the filesystem. Apache 2 allows this behavior to be configured at runtime using the AllowEncodedSlashes directive.

10.5.2. Application Download Flaws

Under ideal circumstances, files will be downloaded directly using the web server. But when a nontrivial authorization scheme is needed, the download takes place through a script after the authorization. Such scripts are web application security hot spots. Failure to validate input in such a script can result in arbitrary file disclosure.

Imagine a set of pages that implement a download center. Download happens through a script called download.php, which accepts the name of the file to be downloaded in a parameter called filename. A careless programmer may form the name of the file by appending the filename to the base directory:

$file_path = $repository_path + "/" + $filename;

An attacker can use the path traversal attack to request any file on the web server:

You can see how I have applied the same principle as before, when I showed attacking the web server directly. A naïve programmer will not bother with the repository path, and will accept a full file path in the parameter, as in:

A file can also be disclosed to an attacker through a vulnerable script that uses a request parameter in an include statement:


PHP will attempt to run the code (making this flaw more dangerous, as I will discuss later in the section "Code Execution"), but if there is no PHP code in the file it will output the contents of the file to the browser.

10.5.3. Source Code Disclosure

Source code disclosure usually happens when a web server is tricked into displaying a script instead of executing it. A popular way of doing this is to modify the URL enough to confuse the web server (and prevent it from determining the MIME type of the file) and simultaneously keep the URL similar enough to the original to allow the operating system to find it. This will become clearer after a few examples.

URL-encoding some characters in the request used to cause Tomcat and WebLogic to display the specified script file instead of executing it (see In the following example, the letter p in the extension .jsp is URL-encoded:

Appending a URL-encoded null byte to the end of a request used to cause JBoss to reveal the source code (see

Apache will respond with a 404 (Not found) response to any request that contains a URL-encoded null byte in the filename.

Many web servers used to get confused by the mere use of uppercase letters in the file extension (an attack effective only on platforms with case-insensitive filesystems):

Another way to get to the source code is to exploit a badly written script that is supposed to allow selective access to source code. At one point, Internet Information Server shipped with such a script enabled by default (see The script was supposed to show source code to the example programs only, but because programmers did not bother to check which files were being requested, anyone was able to use the script to read any file on the system. Requesting the following URL, for example, returned the contents of the boot.ini file from the root of the C: drive:

Most of the vulnerabilities are old because I chose to reference the popular servers to make the examples more interesting. You will find that new web servers almost always suffer from these same problems.

10.5.4. Predictable File Locations

You have turned directory listings off and you feel better now? Guessing filenames is sometimes easy:

Temporary files

If you need to perform a quick test on the web server, chances are you will name the file according to the test you wish to make. Names like upload.php, test.php, and phpinfo.php are common (the extensions are given for PHP but the same logic applies to other environments).

Renamed files

Old files may be left on the server with names such as index2.html, index.old.html, or index.html.old.

Application-generated files

Web authoring applications often generate files that find their way to the server. (Of course, some are meant to be on the server.) A good example is a popular FTP client, WS_FTP. It places a log file into each folder it transfers to the web server. Since people often transfer folders in bulk, the log files themselves are transferred, exposing file paths and allowing the attacker to enumerate all files. Another example is CityDesk, which places a list of all files in the root folder of the site in a file named citydesk.xml. Macromedia's Dreamweaver and Contribute have many publicly available files.

Configuration management files

Configuration management tools create many files with metadata. Again, these files are frequently transferred to the web site. CVS, the most popular configuration management tool, keeps its files in a special folder named CVS. This folder is created as a subfolder of every user-created folder, and it contains the files Entries, Repository, and Root.

Backup files

Text editors often create backup files. When changes are performed directly on the server, backup files remain there. Even when created on a development server or workstation, by the virtue of bulk folder FTP transfer, they end up on the production server. Backup files have extensions such as ~, .bak, .old, .bkp, .swp.

Exposed application files

Script-based applications often consist of files not meant to be accessed directly from the web server but instead used as libraries or subroutines. Exposure happens if these files have extensions that are not recognized by the web server as a script. Instead of executing the script, the server sends the full source code in response. With access to the source code, the attacker can look for security-related bugs. Also, these files can sometimes be manipulated to circumvent application logic.

Publicly accessible user home folders

Sometimes user home directories are made available under the web server. As a consequence, command-line history can often be freely downloaded. To see some examples, type inurl:.bash_history into Google. (The use of search engines to perform reconnaissance is discussed in Chapter 11.)

Most downloads of files that should not be downloaded happen because web servers do not obey one of the fundamental principles of information securityi.e., they do not fail securely. If a file extension is not recognized, the server assumes it is a plain text file and sends it anyway. This is fundamentally wrong.

You can do two things to correct this. First, configure Apache to only serve requests that are expected in an application. One way to do this is to use mod_rewrite and file extensions.

# Reject requests with extensions we don't approve
RewriteCond %{SCRIPT_FILENAME} "!(\.html|\.php|\.gif|\.png|\.jpg)$"
RewriteRule .* - [forbidden]

Now even if someone uploads a spreadsheet document to the web server, no one will be able to see it because the mod_rewrite rules will block access. However, this approach will not protect files that have allowed extensions but should not be served. Using mod_rewrite, we can create a list of requests we are willing to accept and serve only those. Create a plain text file with the allowed requests listed:

# This file contains a list of requests we accept. Because
# of the way mod_rewrite works each line must contain two
# tokens, but the second token can be anything.
/ -
/index.php -
/news.php -
/contact.php -

Add the following fragment to the Apache configuration. (It is assumed the file you created was placed in /usr/local/apache/conf/

# Associate a name with a map stored in a file on disk
RewriteMap allowed_urls txt:/usr/local/apache/conf/
# Try to determine if the value of variable "$0" (populated with the
# request URI in this case) appears in the rewrite map we defined
# in the previous step. If there is a match the value of the
# "${allowed_urls:$0|notfound}" variable will be replaced with the
# second token in the map (always "-" in our case). In all other cases
# the variable will be replaced by the default value, the string that
# follows the pipe character in the variable - "notfound".
RewriteCond ${allowed_urls:$0|notfound} ^notfound$
# Reject the incoming request when the previous rewrite
# condition evaluates to true.
RewriteRule .* - [forbidden]

    Team LiB
    Previous Section Next Section