Previous Page Next Page

1.2. Honeypot Background

Before we can get started with a highly technical discussion of honeypots, some background information on the topic is going to be helpful. To motivate the use of honeypots, it is helpful to first look at network intrusion detection systems (NIDS) [64]. The amount of useful information provided by NIDS is decreasing in the face of ever more sophisticated evasion techniques [70,105] and an increasing number of protocols that employ encryption to protect network traffic from eavesdroppers. NIDS also suffer from high false positive rates that decrease their usefulness even further. Honeypots can help with some of these problems.

A honeypot is a closely monitored computing resource that we want to be probed, attacked, or compromised. More precisely, a honeypot is "an information system resource whose value lies in unauthorized or illicit use of that resource" (the definition from the honeypot mailing list at SecurityFocus, http://www.securityfocus.com/archive/119). The value of a honeypot is weighed by the information that can be obtained from it. Monitoring the data that enters and leaves a honeypot lets us gather information that is not available to NIDS. For example, we can log the keystrokes of an interactive session even if encryption is used to protect the network traffic. To detect malicious behavior, NIDS requires signatures of known attacks and often fail to detect compromises that were unknown at the time it was deployed. On the other hand, honeypots can detect vulnerabilities that are not yet understood. For example, we can detect compromise by observing network traffic leaving the honeypot, even if the means of the exploit has never been seen before.

Because a honeypot has no production value, any attempt to contact it is suspicious by definition. Consequently, forensic analysis of data collected from honeypots is less likely to lead to false positives than data collected by NIDS. Most of the data that we collect with the help of a honeypot can help us to understand attacks.

Honeypots can run any operating system and any number of services. The configured services determine the vectors available to an adversary for compromising or probing the system. A high-interaction honeypot provides a real system the attacker can interact with. In contrast, a low-interaction honeypots simulates only some parts — for example, the network stack [67]. A high-interaction honeypot can be compromised completely, allowing an adversary to gain full access to the system and use it to launch further network attacks. In contrast, low-interaction honeypots simulate only services that cannot be exploited to get complete access to the honeypot. Low-interaction honeypots are more limited, but they are useful to gather information at a higher level — for example, to learn about network probes or worm activity. They can also be used to analyze spammers or for active countermeasures against worms; see Chapter 10 for an overview of case studies on how to use different kinds of honeypots. Neither of these two approaches is superior to the other; each has unique advantages and disadvantages that we will examine in this book.

We also differentiate between physical and virtual honeypots. A physical honeypot is a real machine on the network with its own IP address. A virtual honeypot is simulated by another machine that responds to network traffic sent to the virtual honeypot.

When gathering information about network attacks or probes, the number of deployed honeypots influences the amount and accuracy of the collected data. A good example is measuring the activity of HTTP-based worms [68]. We can identify these worms only after they complete a TCP handshake and send their payload. However, most of their connection requests will go unanswered because they contact randomly chosen IP addresses. A honeypot can capture the worm payload by configuring it to function as a web server or by simulating vulnerable network services. The more honeypots we deploy, the more likely one of them is contacted by a worm.

In general, there are several different types of honeypots. However, in addition to that, we can mix and match the different types, as we will explain and discuss in detail in the following chapters. We start by giving an overview of the different honeypot types before diving deeper into the area of virtual honeypots in the later chapters.

1.2.1. High-Interaction Honeypots

A high-interaction honeypot is a conventional computer system — for example, a commercial off-the-shelf (COTS) computer, a router, or a switch. This system has no conventional task in the network and no regularly active users. Thus, it should neither have any unusual processes nor generate any network traffic except regular daemons or services running on the system. These assumptions aid in attack detection: Every interaction with the high-interaction honeypot is suspicious and could point to a possibly malicious action. Hence, all network traffic to and from the honeypot is logged. In addition, system activity is recorded for later analysis.

We can also combine several honeypots to a network of honeypots: a honeynet. Usually, a honeynet consists of several honeypots of different type (different platforms and/or operating systems). This allows us to simultaneously collect data about different types of attacks. Usually we can learn in-depth information about attacks and therefore get qualitative results of attacker behavior.

A honeynet creates a fishbowl environment that allows attackers to interact with the system while giving the operator the ability to capture all of their activity. This fishbowl also controls the attackers' actions, mitigating the risk of them damaging any nonhoneypot systems. One key element to a honeynet deployment is called the Honeywall, a layer 2 bridging device that separates the honeynet from the rest of the network. This device mitigates risk through data control and captures data for analysis. Tools on the Honeywall allow for analysis of an attacker's activities. Any inbound or outbound traffic to the honeypots must pass through the Honeywall. Information is captured using a variety of methods, including passive network sniffers, IDS alerts, firewall logs, and the kernel module known as Sebek, which we introduce in detail in Section 2.5.1. The attacker's activities are controlled at the network level, with all outbound connections filtered through both an intrusion prevention system and a connection limiter.

One of the drawbacks of high-interaction honeypots is the higher maintenance: You should carefully monitor your honeypot and closely observe what is happening. Analyzing a compromise also takes some time. In our experience, analyzing a complete incident can take hours or even several days until you fully understand what the attacker wanted to achieve!

High-interaction honeypots can be fully compromised. They run real operating systems with all their flaws. No emulation is used, but the attacker can interact with a real system and real services, allowing us to capture extensive information on threats. We can capture the exploits of attackers as they gain unauthorized access, monitor their keystrokes, recover their tools, or learn what their motives are. The disadvantage of high-interaction solutions is that they have increased risk: Because the attackers can potentially fully access the operating system, they can potentially use it to harm other nonhoneypot systems. One of the challenges is their expense and problems with scaling them to a large number of machines. We introduce high-interaction honeypots in more detail in Chapter 2.

1.2.2. Low-Interaction Honeypots

In contrast, low-interaction honeypots emulate services, network stacks, or other aspects of a real machine. They allow an attacker a limited interaction with the target system and allow us to learn mainly quantitative information about attacks. For example, an emulated HTTP server could just respond to a request for one particular file and only implement a subset of the whole HTTP specification. The level of interaction should be "just enough" to trick an attacker or an automated tool, such as a worm that is looking for a specific file to compromise the server. The advantage of low-interaction honeypots is their simplicity and easy maintenance. Normally you can just deploy your low-interaction honeypot and let it collect data for you. This data could be information about propagating network worms or scans caused by spammers for open network relays. Moreover, installation is generally easier for this kind of honeypot: You just install and configure a tool and you are already done. In contrast, high-interaction honeypots are just a general methodology that you have to customize for your environment.

Low-interaction honeypots can primarily be used to gather statistical data and to collect high-level information about attack patterns. Furthermore, they can be used as a kind of intrusion detection system where they provide an early warning, i.e., a kind of burglar alarm, about new attacks (see Chapter 10). Moreover, they can be deployed to lure attackers away from production machines [19,67,87]. In addition, low-interaction honeypots can be used to detect worms, distract adversaries, or to learn about ongoing network attacks. We will introduce many different types of low-interaction honeypots throughout the book. Low-interaction honeypots can also be combined into a network, forming a low-interaction honeynet.

An attacker is not able to fully compromise the system since he interacts just with a simulation. Low-interaction honeypots construct a controlled environment and thus the risk involved is limited: The attacker should not be able to completely compromise the system, and thus you do not have to fear that he abuses your low-interaction honeypots.

There are many different low-interaction honeypots available. In Chapter 3, we present several solutions and show how to use them. Moreover, later chapters focus on specific tools and present them in great detail.

Table 1.1 provides a summarized overview of high- and low-interaction honeypots, contrasting the important advantages and disadvantages of each approach.

Table 1.1. Advantages and Disadvantages of High- and Low-Interaction Honeypots
High-InteractionLow-Interaction
Real services, OS's, or applicationsEmulation of TCP/IP stack, vulnerabilities, and so on
Higher riskLower risk
Hard to deploy and maintainEasy to deploy and maintain
Capture extensive amount of informationCapture quantitative information about attacks


1.2.3. Physical Honeypots

Another possible distinction in the area of honeypots differentiates between physical and virtual honeypots. Physical honeypot means that the honeypot is running on a physical machine. Physical often implies high-interaction, thus allowing the system to be compromised completely. They are typically expensive to install and maintain. For large address spaces, it is impractical or impossible to deploy a physical honeypot for each IP address. In that case, we need to deploy virtual honeypots.

1.2.4. Virtual Honeypots

In this book, we focus on virtual honeypots. Why are these kinds of honeypots so interesting? The main reasons are scalability and ease of maintenance. We can have thousands of honeypots on just one machine. They are inexpensive to deploy and accessible to almost everyone.

Compared to physical honeypots, this approach is more lightweight. Instead of deploying a physical computer system that acts as a honeypot, we can also deploy one physical computer that hosts several virtual machines that act as honeypots. This leads to easier maintenance and lower physical requirements. Usually VMware [103] or User-Mode Linux (UML) [102] are used to set up such virtual honeypots. These two tools allow us to run multiple operating systems and their applications concurrently on a single physical machine, making it much easier to collect data. We introduce both tools in more detail in Chapter 2. Moreover, other types of virtual honeypots are introduce in the other chapters, in which we focus on different aspects of honeypots. Since the complete book focuses on virtual honeypots, we will not introduce too many details here. The main aspect you should keep in mind is that a virtual honeypot is simulated by another machine that responds to network traffic sent to the virtual honeypot.

For any honeypot to work, the external Internet needs to be able to reach it. Many of us are connected to the Internet via DSL or cable modems. These devices usually employ network address translation (NAT). Even though you might have a complete network behind the modem, your internal network is not reachable from the Internet. As such, you are not going to get valuable data by deploying a honeypot on a NATed network. Some NAT devices allow you to change the port-fowarding configuration and at least allow you get a little bit exposure to the Internet. For more serious experiments, you should find an ISP that provides you with real unfiltered IP connectivity.

1.2.5. Legal Aspects

Honeypots have some risks. If an attacker manages to compromise one of your honeypots, he could try to attack other systems that are not under your control. These systems can be located anywhere in the Internet, and the attacker could use your honeypot as a stepping stone to attack sensitive systems. This implies some legal problems when running a honeypot, but different laws in different countries make it hard to give a consistent overview of the legal situation. We will not discuss the legal aspects of operating a honeypot system because if you live in the United States, you can get information about such laws in Richard Salgado's chapter in Know Your Enemy (http://www.honeynet.org/book/). The laws are similar in most countries, especially Europe and the United States. You must consider certain issues — for example, your ISP could explicitly prohibit running a honeypot on your IP address or, due to unforeseen steps by an adversary, other machines might be compromised. If you are unsure about what you are doing, consult a lawyer. Also you should contact a local Honeynet group, which can give you an overview of the legal situation in your country. You can find an overview of different Honeynet groups all around the world at the website of the Honeynet Project, available at http://www.honeynet.org/.

Previous Page Next Page