12.1. Evolution of Web Intrusion Detection
Intrusion detection has been in use for many years. Its purpose is to detect attacks by looking at the network traffic or by looking at operating system events. The term intrusion prevention is used to refer to systems that are also capable of preventing attacks.
Today, when people mention intrusion detection, in most cases they are referring to a network intrusion detection system (NIDS). An NIDS works on the TCP/IP level and is used to detect attacks against any network service, including the web server. The job of such systems, the most popular and most widely deployed of all IDSs, is to monitor raw network packets to spot malicious payload. Host-based intrusion detection systems (HIDSs), on the other hand, work on the host level. Though they can analyze network traffic (only the traffic that arrives to that single host), this task is usually left to NIDSs. Host-based intrusion is mostly concerned with the events that take place on the host (such as users logging in and out and executing commands) and the system error messages that are generated. An HIDS can be as simple as a script watching a log file for error messages, as mentioned in Chapter 8. Integrity validation programs (such as Tripwire) are a form of HIDS. Some systems can be complex: one form of HIDS uses system call monitoring on a kernel level to detect processes that behave suspiciously.
Using a single approach for intrusion detection is insufficient. Security information management (SIM) systems are designed to manage various security-relevant events they receive from agents, where an agent can listen to the network traffic or operating system events or can work to obtain any other security-relevant information.
Because many NIDSs are in place, a large effort was made to make the most of them and to use them for web intrusion detection, too. Though NIDSs work well for the problems they were designed to address and they can provide some help with web intrusion detection, they do not and cannot live up to the full web intrusion detection potential for the following reasons:
Vendors of NIDSs have responded to the challenges by adding extensions to better understand HTTP. The term deep-inspection firewalls refers to systems that make an additional effort to understand the network traffic on a higher level. Ultimately, a new breed of IDSs was born. Web application firewalls (WAFs), also known as web application gateways, are designed specifically to guard web applications. Designed from the ground up to support HTTP and to exploit its transactional nature, web application firewalls often work as reverse proxies. Instead of going directly to the web application, a request is rerouted to go to a WAF first and only allowed to proceed if deemed safe.
Web application firewalls were designed from the ground up to deal with web attacks and are better suited for that purpose. NIDSs are better suited for monitoring on the network level and cannot be replaced for that purpose.
Though most vendors are focusing on supporting HTTP, the concept of application firewalls can be applied to any application and protocol. Commercial products have become available that act as proxies for other popular network protocols and for popular databases. (Zorp, at http://www.balabit.com/products/zorp/, available under a commercial and open source license, is one such product.)
Learn more about intrusion detection to gain a better understanding of common problems. I have found the following resources useful:
12.1.1. Is Intrusion Detection the Right Approach?
Sometimes there is a controversy as to whether we are correct to pursue this approach to increasing security. A common counterargument is that web intrusion detection does not solve the real problem, and that it is better to go directly to the problem and fix weak web applications. I agree with this opinion generally, but the reality is preventing us from letting go from IDS techniques:
As a result, I recommend we raise awareness about security among management and developers. Since awareness will come slowly, do what you can in the meantime to increase security.
12.1.2. Log-Based Web Intrusion Detection
I already covered one form of web intrusion detection in Chapter 8. Log-based web intrusion detection makes use of the fact that web servers produce detailed access logs, where the information about every request is kept. It is also possible to create logs in special formats to control which data is collected. This cost-effective method introduces intrusion detection to a system but there is a drawback. Log-based web intrusion detection is performed only after transactions take place; therefore, attack prevention is not possible. Only detection is. If you can live with that (it is a valid decision and it depends on your threat model), then you only need to take a few steps to implement this technique:
That is all there is to it. (Refer to Chapter 8 for a detailed discussion.)
12.1.3. Real-Time Web Intrusion Detection
With real-time intrusion detection, not only can you detect problems, but you can react to them as well. Attack prevention is possible, but it comes with a price tag of increased complexity and more time required to run the system. Most of this chapter discusses the ways of running real-time web intrusion detection. There are two approaches:
Which of these two you choose depends on your circumstances. The web server-based approach is easy to implement since it does not mandate changes to the network design and configuration. All that is needed is the addition of a module to the web server. But if you have many web servers, and especially if the network contains proprietary web servers, then having a single place from which to perform intrusion detection can be the more efficient approach. Though network-based web IDSs typically perform full separation of clients and servers, web server-based solutions can be described more accurately as separating clients from applications, with servers left unprotected in the middle. In this case, therefore, network-based protection is better because it can protect from flaws in web servers, too.
With Apache and mod_security you can choose either approach to real-time web intrusion detection. If network-based web intrusion detection suits your needs best, then you can build such a node by installing an additional Apache instance with mod_security to work in a reverse proxy configuration. (Reverse proxy operation is discussed in Chapter 9.) Aside from initial configuration, the two modes of operation are similar. The rest of this chapter applies equally to both.
12.1.4. Web Intrusion Detection Features
Later in this chapter, I will present a web intrusion detection solution based on open source components. The advantage of using open source components is they are free and familiar (being based on Apache). Products from the commercial arena have more features, and they have nice user interfaces that make some tasks much easier. Here I will present the most important aspects of web IDSs, even if some features are present only in commercial products. I expect the open source products to catch up, but at this point a discussion of web intrusion detection cannot be complete without including features available only in commercial products. The following sections describe some common intrusion detection features.
188.8.131.52 Protocol anomaly detection
If you read through various RFCs, you may detect a recurring theme. Most RFCs recommend that implementations be conservative about how they use protocols, but liberal with respect to what they accept from others. Web servers behave this way too, but such behavior opens the door wide open for all sorts of attacks. Almost all IDSs perform some sort of sanity check on incoming requests and refuse to accept anything that is not in accordance with the HTTP standard. Furthermore, they can narrow down the features to those that are acceptable to the application and thus reduce the attack surface area.
184.108.40.206 Negative versus positive security models
If you have ever worked to develop a firewall policy, you may have been given (good) advice to first put rules in place to deny everything, and then proceed to allow what is safe. That is a positive security model. On the other side is a negative security model, in which everything that is not dangerous is allowed. The two approaches each ask a question:
A negative security model is used more often. You identify a dangerous pattern and configure your system to reject it. This is simple, easy, and fun, but not foolproof. The concept relies on you knowing what is dangerous. If there are aspects of the problem you are not aware of (which happens from time to time) then you have left a hole for the attacker to exploit.
A positive security model (also known as a white-list model) is a better approach to building policies and works well for firewall policy building. In the realm of web application security, a positive security model approach boils down to enumerating every script in the application. For each script in the list, you need to determine the following:
This is what programmers are supposed to do but frequently do not. Using the positive security model is better if you can afford to spend the time to develop it. One difficult aspect of this approach is that the application model changes as the application evolves. You will need to update the model every time a new script is added to the application or if an existing one changes. But it works well to protect stable, legacy applications that no one maintains anymore.
Automating policy development can ease problems:
220.127.116.11 Rule-based versus anomaly-based protection
Rule-based IDSs comprise the majority of what is available on the market. In principle, every request (or packet in the case of NIDS) is subject to a series of tests, where each test consists of one or more inspection rules. If a test fails, the request is rejected as invalid.
Rule-based IDSs are easy to build and use and are efficient when used to defend against known problems or when the task is to build a custom defense policy. But since they must know about the specifics of every threat to protect from it, these tools must rely on using extensive rule databases. Vendors maintain rule databases and distribute their tools with programs to update IDS installations automatically.
This approach is unlikely to be able to protect custom applications or to protect from zero-day exploits (exploits that attack vulnerabilities not yet publicly known). This is where anomaly-based IDSs work better.
The idea behind anomaly-based protection is to build a protection layer that will observe legal application traffic and then build a statistical model to judge the future traffic against. In theory, once trained, an anomaly-based system should detect anything out of the ordinary. With anomaly-based protection, rule databases are not needed and zero-day exploits are not a problem. Anomaly-based protection systems are difficult to build and are thus rare. Because users do not understand how they work, many refuse to trust such systems, making them less popular.
18.104.22.168 Enforcing input validation
22.214.171.124 State management
The stateless nature of the HTTP protocol has many negative impacts on web application security. Sessions can and should be implemented on the application level, but for many applications the added functionality is limited to fulfilling business requirements other than security. Web IDSs, on the other hand, can throw their full weight into adding various session-related protection features. Some of the features include:
126.96.36.199 Anti-evasion techniques
One area where network-based IDSs have had trouble with web traffic is with respect to evasion techniques (see Chapter 10). The problem is there are so many ways to alter incoming (attack) data, so it keeps the original meaning and the application interprets it, but it is modified sufficiently to sneak under the IDS radar. This is an area where dedicated web IDSs are providing significant improvement. For example, just by looking at whole HTTP requests at a time, an entire class of attacks based on request fragmentation is avoided. And because they understand HTTP well and can separate dynamic requests from requests for static resources (and so choose not to waste time protecting static requests that cannot be compromised), they can afford to apply many different anti-evasion techniques that would prove too time consuming for NIDSs.
188.8.131.52 Response monitoring and information leak prevention
Information leak prevention is a fancy name for response monitoring. In principle it is identical to request monitoring, and its goal is to watch the output for suspicious patterns and prevent the response from reaching the client when such a pattern is detected. The most likely candidates for patterns in output are credit card numbers and social security numbers. Another use for this technique is to watch for signs of successful intrusions, as I will demonstrate later in the chapter.
It is impossible to prevent information leak by a determined and skillful attacker, since he will always be able to encode the information in such a way as to prevent detection by an IDS. Still, this technique can protect when the attacker does not have full control over the server but instead tries to exploit a weakness in the application.