Previous Page Next Page

8.2. Low-Interaction Client Honeypots

Again, we can use the principle behind low-interaction honeypots to learn more about threats in communication networks. We introduce in this section several possibilities to build low-interaction client honeypots. With the help of these tools, you can collect information about malicious attacks in several areas. Our first example will deal with malicious HTML files, but we can extend the basic principle to similar areas — for example, malicious images or other file formats.

The risk involved in running such a solution is rather low. We have a rather good control about what is happening with our low-interaction client honeypots, since we steer the progress of the tool. In addition, you can also safeguard your honeyclient with the help of mechanisms like chroot or Systrace, as outlined in Section 3.7.

Low-interaction client honeypots are likely to have a lower detection rate, since the attack must be known to the client honeypot. New attacks are likely to go unnoticed. However, low-interaction client honeypots, due to their lightweight nature, are easy to deploy and operate and also very speedy in interacting with servers.

In the next section, we present the general setup of a low-interaction honeyclient that can be used to detect malicious websites. We describe the individual building blocks in detail and identify possible caveats. At the end, we show how these building blocks can be linked together and how a possible extension to other areas can be derived.

8.2.1. Learning About Malicious Websites

If we want to find malicious websites, we need to proceed in two steps. In the first step, we try to find suspicious sites. There are a variety of options, some more promising than others. In the second step, we identify whether any of these sites is really malicious. Since we follow the low-interaction honeypot paradigm in this section, we use simple but efficient mechanisms to carry out this identification process. In Section 8.3 we will outline how a more powerful but slower mechanism can be realized with the help of high-interaction honeyclients.

As just mentioned, the first step of this methodology will be to find sites attacking web browsers. We have several options, and our experience shows that the most promising are the following:

The first four options are the most promising and also the easiest to implement. With all approaches, you get a list of suspicious URLs. Of course, the associated website can also be recursively examined — that is, we start with the first URL of the results, download it to our hard disk, extract all links from it, and add those links to our list of suspicious URLs. This way we can crawl suspicious parts of the World Wide Web. In addition, we should use mechanisms to restrict the depth and breadth we crawl, or our focus may be too broad. As a best practice, it has proven to be better to crawl deeper than wider. This way, more domains are visited, and the chances of hitting a malicious site are higher.

For a crawling engine, we can use one of the many available in the Internet. One examples is crawl, which is available at http://monkey.org/~provos/crawl. This crawler is optimized to download JPEG images, but you can also use it to mirror other filetypes. Other popular crawlers include Heritrix (http://crawler.archive.org), the crawler from the Internet Archive, and Web-SPHINX, a multithreaded crawler written in Java. We can also use tools like wget or curl to download the URLs we have extracted from search engines or spam messages. In all cases you should make sure that the User-Agent is set to a value that imitates a legal browser.

The User-Agent field is an HTTP header field that can be used to distinguish a crawler from a human. Attackers often use it to differentiate which exploit is delivered. If the User-Agent points to a crawler, an innocent HTML page is delivered, and if the User-Agent points to a vulnerable version of Internet Explorer, the web page includes an exploit. For wget and curl, you can change it in the following way:

// Microsoft Internet Explorer 7.0b running on Vista
// additionally, download all files specified in urls.txt
wget -i urls.txt -U="Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 6.0)"

// Microsoft Internet Explorer 6.0 running on XP
curl -A 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)'


A powerful crawler is Heritrix, the crawler from the Internet Archive (http://www.archive.org/index.php). The goal of the Internet Archive is to build an "Internet library" — that is, a digital library that contains as many documents as possible from the Internet. They aim at offering permanent access for researchers, historians, and scholars to historical collections that exist in digital format. These formats include archived web pages, texts, audio, video, and even software. Since 1996 the people behind the Internet Archive have searched for ways to preserve this kind of data, and as a side project they have implemented the crawler Heritrix. This tool is specially designed for web archiving — that is, downloading a given location as complete as possible. And this is what we need for a low-interaction honeyclient. We have to examine a given location as thoroughly as possible to detect malicious content on these sites. For example, the attacker could embed his malcode in some obfuscated JavaScript within the HTML of the page. Or he could insert a link to a malicious image that triggers an overflow on the victim's machine. Thus, we also need to follow the links, and this is where Heritrix comes in. We will not go into details here, but if you can are interested, you can find more information about the crawler at http://crawler.archive.org.

At this point, we have a mechanism to automatically find and download websites that could be suspicious. We have only downloaded them to our local hard disk and no analysis has happened yet. Therefore, the second step is to analyze the downloaded content to find malicious content in it. A simple way to start such an analysis is checking all files with common antivirus engines. For example, the tool ClamAV even has support to detect malicious web pages that contain, for example, phishing scams or browser exploits, as the following listing shows:

$ /usr/bin/file *.html
phish.html: HTML document text
setslice-exploit.html HTML document text

$ /usr/bin/clamscan *.html
phish.html: HTML.Phishing.Bank-44 FOUND
setslice-exploit.html: Exploit.CVE-2006-3730 FOUND

----------- SCAN SUMMARY -----------
Known viruses: 76054
Engine version: 0.88.5
Scanned directories: 0
Scanned files: 2
Infected files: 2
Data scanned: 0.03 MB
Time: 1.910 sec (0 m 1 s)


Besides scanning with antivirus engines, we can use more advanced methods like analyzing the content via intrusion detection systems like Snort or custom static analysis.

Figure 8.3 depicts the whole setup of such a low-interaction client honeypot. We have a crawling engine that downloads suspicious websites and other files from the Internet to our honeypot. The input for the crawler is a result of search engine queries with suspicious names (e.g., warez) or URLs extracted from spam messages. All downloaded data is then handed over to a checking engine. This engines analyzes the content two ways. First, we can use antivirus engines to check for known malware or other bad content. Second, we use our own database with malware signatures to search the downloaded files for malicious content. If the checking engine finds something malicious, it generates an alert and notifies the operator of the honeyclient. As you see, the whole design is rather simple, but it is sufficient.

Figure 8.3. General setup of low-interaction client honeypots.


One thing to note is the problem of a revisit policy — that is, how often we check a given suspicious website for new content. Attackers often change the file hosted at a certain location from time to time. This can, for example, be necessary from an attacker's point of view when certain antivirus engines start to detect the first binary. By changing the malware but retaining the original website, the attacker can seed new malware. As a best-practice value, suspicious websites should be crawled on a daily basis to detect such changes.

There are some other issues with crawlers that we will touch on briefly. Active and/or dynamic content like JavaScript, Flash, or similar content can pose a problem, since the crawler normally cannot execute or display this content. Therefore, it could be possible that we miss certain types of exploits. A similar problem can be encountered because we are not using a real browser but just a crawler. This simulation can be noticed by the attacker, and the exploit would then not be served to the honeyclient. An attacker can spot a crawler due to fingerprinting the requests and looking for suspicious signs or unusual timings.

8.2.2. HoneyC

HoneyC is an implementation of the low-interaction client honeypot concept. As just explained, these types of client honeypots do not use a fully functional operating system and web browser to analyze malicious content on the web but use a simulated client. Malicious servers can then be detected by statically examining the web server's response — for example, by searching for exploits with the help of different signatures.

HoneyC uses simulated clients that can solicit as much of a response from a server as necessary for analysis of malicious content. Due to the modular design, HoneyC is flexible and can be extended in many different ways. It can simulate different visitor clients, search via different methods for suspect websites, and analyze the collected data in various ways.

The initial HoneyC version (releases 1.2.x) concentrates on searching for malicious web servers based on Snort signatures. The initial version does not contain any malware signatures yet, but the author plans to add them in the near future. The official website of the tool is http://honeyc.sourceforge.net/, and you can reach a support forum at http://sourceforge.net/forum/?group_id=172208.

The schematic overview of HoneyC is depicted in Figure 8.4. The client honeypot consists of three different components: queuer, visitor, and analysis engine. These modules interact with each other, and the logical flow of information is shown in the figure. The queuer is the component responsible for creating a queue of suspicious servers that should be analyzed further. It can employ several different methods to create the queue of servers as outlined above. Version 1.0.0 of HoneyC contains a Yahoo search queuer that creates a list of servers by querying the Yahoo Search API. Yahoo Search API is a web service offered by Yahoo! that allows an easy access to the search results by this search engine. In version 1.1.2, a simple list queuer was added that lets you statically set a list of server requests to be inserted into the queue. Besides these two components, HoneyC does not offer additional queuers. However, extending the queuer to support additional crawling via other web services or link extraction from spam messages should not be too hard.

Figure 8.4. Schematic overview of HoneyC.


All collected information is handed over to the visitor. This component is responsible for the actual interaction with the suspicious web server. The visitor usually makes a request to the server, simulating a normal web browser. Afterward, it consumes and processes the response. Version 1.0.0 of HoneyC contains a web browser visitor component that allows you to visits web servers.

All information collected by the visitor is then handed over to the analysis engine. This component checks whether a security policy has been violated while the visitor interacted with the web server. This check is currently done via processing the response with the help of a given set of signatures based on Snort. The analysis process is rather easy: A given response from the visitor can be examined via regular expressions that match on either the content of the response or the URL. If one of these rules match, an alert is generated, and some additional information is shown to the user. Please note that version 1.2.0 does not perform an analysis for malicious content with the provided example rules. These check only simple heuristics, and you need to implement your own Snort rules to achieve informative results. The author plans to add more advanced rules in a future release. Several other analysis mechanisms could be incorporated into the tool — for example, checking the downloaded results with common antivirus engines or a behavior-based analysis.

All components let you use pluggable modules to suit specific needs. This is achieved by loosely coupling the components via a command redirection operator — that is, via pipes — and passing a serialized representation of the request and response objects via those pipes. Based on this modular design it is easy to implement a new component as just outlined and to flexible deploy HoneyC. For example, it is possible to extend the queuer component by implementing a web service query via Google's search API, or you could also implement a queuer component that crawls a given URL in Python. Second, a visitor component could simply use a scripted wget request to retrieve the files or an instrumented version of Firefox. Last, the analysis engine can use simple Snort rules to scan for suspect websites or use an emulated environment and study the behavior when accessing the file in that environment.

HoneyC is OS independent because it is written in the scripting language Ruby, which is interpreted. Our experience with the tool is based on running it in a Linux environment, and we strongly advise you to do likewise or on a Unix machine. Most malicious content you will find deals with vulnerabilities for Internet Explorer or other products for Windows. Therefore, you have a much higher risk of infecting your analysis machine if it runs on Windows instead of Linux.

Installation of HoneyC is very easy. In the first step, make sure that you have Ruby installed. Ruby is an object-oriented scripting language and free software distributed under an open source license. If you do not have Ruby installed, please use the package management solution from your Linux distribution — for example, by executing sudo aptitude install ruby on a machine running Debian/Linux. Then download the latest version of HoneyC from the official website and extract the ZIP file. Change into the extracted directory, and start the unit test, which will check whether your system meets all requirements. Please note that you need to have network connectivity and direct outgoing access on port 80 for the unit tests to succeed. The whole checking process can be started with the following command:

$ ruby -v UnitTester.rb
ruby 1.8.5 (2006-08-25) [i486-linux]
[...]
Started
...............<httpResponses>
</httpResponses>
.........................................
[...]
153 tests, 206 assertions, 1 failures, 0 errors


If no error occurs, you are ready to use HoneyC. However, if the unit test finds errors, please consult the Readme file for workarounds for common problems, or ask your question in the help forum.

Each of the three components has excellent built-in help, which you can access via the parameter --help in the files queuer/YahooSearch.rb, visitor/ WebBrowser, and analysisEngine/SnortRulesAnalysis Engine. rb, respectively. For example, the built-in help explains in detail the format of the analysis engine configuration, which we will later illustrate with a running example:

$ ruby analysisEngine/SnortRulesAnalysisEngine.rb --help
Usage: ruby -s analysisEngine/SnortRulesAnalysisEngine.rb
         -c=[location of snort rules analysis engine configuration file]
Analyze http responses against snort rules and output a report.

Snort Rules Analysis Engine Configuration File Format
-----------------------------------------------------
<snortRulesAnalysisEngineConfiguration
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:noNamespaceSchemaLocation=
 "SnortRulesAnalysisEngineConfiguration_v1_0.xsd">
   <rulesLocation>analysisEngine/example.rules</rulesLocation>
</snortRulesAnalysisEngineConfiguration>

The snort configuration file simply specifies the relative or absolute
location of the rules file.

Snort Rules File Format
-----------------------
alert tcp any any <> any any (msg: "rule1"; reference:url,http://rule1.com;
 sid:1000001; rev:1; classtype:trojan-activity; pcre:"/rule1pcre/"; )
alert tcp any any <> any any (msg: "google"; reference:url,http://rule2.com;
 sid:1000002; rev:2; classtype:attempted-dos; pcre:"/google/"; )
alert tcp any any <> any any (msg: "rule3"; reference:url,http://rule3.com;
 sid:1000003; rev:1; classtype:trojan-activity; pcre:"/rule3pcre/"; )

The Snort rules file format adheres to the official Snort rules format
(see Snort manual on http//www.snort.org). Some restrictions apply within
the conext of HoneyC.
In addition to the official Snort rules format, HoneyC supports the
additional tag headercontent. It can be used to match on specific http
response header content. Matching can restrict the key value pair by
 creating a match string in the following format: headercontent:"name="key">value<. In conjunction with this new tag a new pcre option H has been
implemented to support pcres on header content.

Report bugs to <http://sourceforge.net/tracker/?group_id=172208&atid=860868>

					  


Now we take a look at the actual configuration process based on a longer example. Imagine that you are interested in learning more about Webattacker, a toolkit that allows an attacker to easily build a malicious website with diverse exploits for Internet Explorer. Moreover, Webattacker includes scripts that detect the Internet Explorer version of the attacked machine and spam-sending techniques to lure victims to the malicious websites. The toolkit has support for many vulnerabilities, including the following:

As you can see, quite a few vulnerabilities from Internet Explorer can be exploited via this toolkit. The complete kit was also sold by an attacker for a small fee. Sophos reports that it costs about $15 on the black market. Usually the attacker installs some kind of Trojan Horse on the infected machine and thus gains complete control over the machine.

One way to learn more about this tool is to search for characteristic signatures of this exploit. One characteristic signature for Webattacker is the URL of the exploit itself, which is served as a CGI script. It usually contains the strings "ie" and ".cgi" with a number. This is a piece of information that we can use to detect Webattacker attacks. If we are also interested in attacks that use the tool r57shell, a PHP-based backdoor used in attacks against web applications, we simply use that string to detect suspicious sites.

Based on the information we want to collect, we can start to configure HoneyC. The main configuration file is in XML format, and we base our running example on the file HoneyCConfigurationExample.xml, which is included in the 1.2.0 release of HoneyC. This file specifies where the tool can find the configuration files for the three components:

<honeyCConfiguration xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:noNamespaceSchemaLocation="HoneyCConfiguration_v1_0.xsd">
   <queuer>ruby -s queuer/YahooSearch.rb -c=queuer/YahooSearchConfiguration
   Example.xml</queuer>
   <visitor>ruby -s visitor/WebBrowser.rb -c=visitor/WebBrowserConfiguration
   Example.xml</visitor>
   <analysisEngine>ruby -s analysisEngine/SnortRulesAnalysisEngine.rb
    -c=analysisEngine/SnortRulesAnalysisEngineConfigurationExample.xml
</analysisEngine>
</honeyCConfiguration>

					  


For each component, we have to specify the path and possible arguments. These additional configuration files are then used to actually configure the behavior of HoneyC. For now, we simply use the defaults and change only the configuration of the different components.

The format of the queuer configuration file is simple. We specify the string we want to search for and the maximum number of results we want. This information is entered in the file queuer/YahooSearchConfigurationExample.xml and could look like the following example:

<yahooSearchConfiguration xmlns:xsi="http://www.w3.org/2001/
XMLSchema-instance"
xsi:noNamespaceSchemaLocation="YahooSearchConfiguration_v1_0.xsd"
applicationID="_HoneyC_">
<query results="100" format="html">ie0604.cgi</query>
</yahooSearchConfiguration>


Based on this configuration, the queuer searches via the Yahoo Search API for suspicious websites that are returned when searching for the string "ie0604.cgi," a typical sign of Webattacker or other attacks. In the second steps, these suspicious sites are visited with the help of the visitor component, which can be configured via the file visitor/WebBrowser.rb. Again, we enter the details in XML format and specify the user agent, whether links should be followed, and how many threads the visitor component can use:

<webBrowserConfiguration xmlns:xsi="http://www.w3.org/2001/
 XMLSchema-instance"
 xsi:noNamespaceSchemaLocation="WebBrowserConfiguration_v1_0.xsd">
      <userAgent>Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
</userAgent>
      <followALink>false</followALink>
      <browserThreads>30</browserThreads>
</webBrowserConfiguration>


In the running example, we configure the visitor component to behave like an Internet Explorer 6 instance running on Windows XP. HoneyC uses this configuration to investigate all queued URLs and then passes the results to the analysis engine. The configuration of this component specifies the path of the rules:

<snortRulesAnalysisEngineConfiguration xmlns:xsi="http://www.w3.org/2001/
 XMLSchema-instance"
 xsi:noNamespaceSchemaLocation="SnortRulesAnalysisEngineConfiguration_v1_0.
  xsd">
      <rulesLocation>analysisEngine/example.rules</rulesLocation>
</snortRulesAnalysisEngineConfiguration>


The specified file contains the rules that should be checked for each URL found. In our running example, we use different rules to detect the PHP backdoor r57shell and two signs for Webattacker files. Moreover, we search for a common DDoS tool that contains the name r3v3ng4ns:

alert tcp any any <> any any (msg: "suspicious string 'PHP shell' found";
  sid:1000001; rev:1; classtype:trojan-activity; pcre:"/r57shell/"; )
alert tcp any any <> any any (msg: "possible 'Web-Attacker' found";
  sid:1000002; rev:1; classtype:trojan-activity; pcre:"/Web-Attacker Control
  panel/"; )
alert tcp any any <> any any (msg: "possible 'Web-Attacker' found";
  sid:1000003; rev:1; classtype:trojan-activity; pcre:"/Err: this user is
  already attacked!/"; )
alert tcp any any <> any any (msg: "suspicious defacing tool found";
  sid:1000002; rev:1; classtype:trojan-activity; pcre:"/r3v3ng4ns/"; )

					  


These rules are all standard regular expressions. Since all components are now configured, we can start the actual HoneyC process via the following command:

$ ruby -s HoneyC.rb -c=HoneyCConfigurationExample.xml
01/19-04:10:15.000000 [**] [1:1000001:1] possible Web-Attacker found [**]
  [Classification: A Network Trojan was detected] [Priority: 1] {TCP}
  localhost -> http//img.secondsite2.com/cgi-bin/ie0604.cgi
[...]
Snort Rules Analysis Engine Statistics:
Analyzed 315 responses in 100.365605
Matches found 4
Average Analysis Time: 0.000812250793650794
HttpResponse Statistics:
All Count: 315
Average Size (200-OK): 3680
Error Code Count: [-403 - Forbidden 1--200 - OK 292--404 - Not Found 19--
  408 - Timeout Error 2--501 - getaddrinfo: Name or service not known 1-]
Content Type Count: [-application/x-javascript; charset=utf8 2--text/plain 1
  --image/png 17--text/html; charset=utf-8 3--application/x-javascript;
 charset=utf-8 5
  --application/x-javascript 22--image/jpeg 18--text/html; charset=iso-
 8859-1 3  --image/gif 182-]
Content Type Length Averages: [-application/x-javascript; charset=utf8 278
  --text/plain 9415--image/png 13666--text/html; charset=utf-8 4547
  --application/x-javascript; charset=utf-8 4987--application/x-javascript
  10684
  --image/jpeg 17711--text/html; charset=iso-8859-1 287--image/gif 1313-]

					  


The tool now runs and searches for websites that match the specified criteria. If your rules and search times have been carefully chosen, the chances are high that you will find an interesting page. In this example, HoneyC successfully identifies one page as infected, and a manual analysis can provide you with even more information about this kind of attack.

Due to the lack of real malware signatures, the current version of HoneyC is a bit limited. It can be usefull to find other instances of well-known attacks, but for detecting more stealthy attacks, it lacks signatures and more flexible components. Nevertheless, it could become a useful tool in the area of low-interaction client-side honeypots.

Previous Page Next Page