9.4. Using a Reverse Proxy
A proxy is an intermediary communication device. The term "proxy" commonly refers to a forward proxy, which is a gateway device that fetches web traffic on behalf of client devices. We are more interested in the opposite type of proxy. Reverse proxies are gateway devices that isolate servers from the Web and accept traffic on their behalf.
There are two reasons to add a reverse proxy to the network: security and performance. The benefits coming from reverse proxies stem from the concept of centralization: by having a single point of entry for the HTTP traffic, we are increasing our monitoring and controlling capabilities. Therefore, the larger the network, the more benefits we will have. Here are the advantages:
There are some disadvantages as well:
9.4.1. Apache Reverse Proxy
The use of Apache 2 is recommended in reverse proxy systems. The new version of the mod_proxy module offers better support for standards and conforms to the HTTP/1.1 specification. The Apache 2 architecture introduces filters, which allow many modules to look at the content (both on the input and the output) simultaneously.
The following modules will be needed:
You are unlikely to need mod_proxy_connect, which is needed for forward proxy operation only.
188.8.131.52 Setting up the reverse proxy
Compile the web server as usual. Whenever the proxy module is used within a server, turn off the forward proxying operation:
# do not work as forward proxy ProxyRequests Off
Not turning it off is a frequent error that creates an open proxy out of a web server, allowing anyone to go through it to reach any other system the web server can reach. Spammers will want to use it to send spam to the Internet, and attackers will use the open proxy to reach the internal network.
Two directives are needed to activate the proxy:
ProxyPass / http://web.internal.com/ ProxyPassReverse / http://web.internal.com/
The first directive instructs the proxy to forward all requests it receives to the internal server web.internal.com and to forward the responses back to the client. So, when someone types the proxy address in the browser, she will be served the content from the internal web server (web.internal.com) without having to know about it or access it directly.
The same applies to the internal server. It is not aware that all requests are executed through the proxy. To it the proxy is just another client. During normal operation, the internal server will use its real name (web.internal.com) in a response. If such a response goes to the client unmodified, the real name of the internal server will be revealed. The client will also try to use the real name for the subsequent requests, but that will probably fail because the internal name is hidden from the public and a firewall prevents access to the internal server.
This is where the second directive comes in. It instructs the proxy server to observe response headers, modify them to hide the internal information, and respond to its clients with responses that make sense to them.
Another way to use the reverse proxy is through mod_rewrite. The following would have the same effect as the ProxyPass directive above. Note the use of the P (proxy throughput) and L (last rewrite directive) flags.
RewriteRule ^(.+)$ http://web.internal.com/$1 [P,L]
At this point, one problem remains: applications often generate and embed absolute links into HTML pages. But unlike the response header problem that gets handled by Apache, absolute links in pages are left unmodified. Again, this reveals the real name of the internal server to its clients. This problem cannot be solved with standard Apache but with the help of a third-party module, mod_proxy_html, which is maintained by Nick Kew. It can be downloaded from http://apache.webthing.com/mod_proxy_html/. It requires libxml2, which can be found at http://xmlsoft.org. (Note: the author warns against using libxml2 versions lower than 2.5.10.)
To compile the module, I had to pass the compiler the path to libxml2:
# apxs -Wc,-I/usr/include/libxml2 -cia mod_proxy_html.c
For the same reason, in the httpd.conf configuration file, you have to load the libxml2 dynamic library before attempting to load the mod_proxy_html module:
LoadFile /usr/lib/libxml2.so LoadModule proxy_html_module modules/mod_proxy_html.so
The module looks into every HTML page, searches for absolute links referencing the internal server, and replaces them with links referencing the proxy. To activate this behavior, add the following to the configuration file:
# activate mod_proxy_html SetOutputFilter proxy-html # prevent content compression in backend operation RequestHeader unset Accept-Encoding # replace references to the internal server # with references to this proxy ProxyHTMLURLMap http://web.internal.com/ /
You may be wondering about the directive to prevent compression. If the client supports content decompression, it will state that with an appropriate Accept-Encoding header:
If that happens, the backend server will respond with a compressed response, but mod_proxy_html does not know how to handle compressed content and it fails to do its job. By removing the header from the request, we force plaintext communication between the reverse proxy and the backend server. This is not a problem. Chances are both servers will share a fast local network where compression would not work to enhance performance.
Read Nick's excellent article published in Apache Week, in which he gives more tips and tricks for reverse proxying:
9.4.2. Reverse Proxy by Network Design
The most common approach to running a reverse proxy is to design it into the network. The web server is assigned a private IP address (e.g., 192.168.0.1) instead of a real one. The reverse proxy gets a real IP address (e.g., 184.108.40.206), and this address is attached to the domain name (which is www.example.com in the following example). Configuring Apache to respond to a domain name by forwarding requests to another server is trivial:
<VirtualHost www.example.com> ProxyPass / http://192.168.0.1/ ProxyPassReverse / http://192.168.0.1/ # additional mod_proxy_html configuration # options can be added here if required </VirtualHost>
9.4.3. Reverse Proxy by Redirecting Network Traffic
Sometimes, when faced with a network that is already up and running, it may be impossible or too difficult to reconfigure the network to introduce a reverse proxy. Under such circumstances you may decide to introduce the reverse proxy through traffic redirection on a network level. This technique is also useful when you are unsure about whether you want to proxy, and you want to see how it works before committing more resources.
The following steps show how a transparent reverse proxy is introduced to a network, assuming the gateway is capable of redirecting traffic:
The exact firewall rule depends on the type of gateway. Assuming the web server is at 192.168.1.99 and the reverse proxy is at 192.168.1.100, the following iptables command will transparently redirect all web server traffic through the proxy:
# iptables -t nat -A PREROUTING -d 192.168.1.99 -p tcp --dport 80 \ > -j DNAT --to 192.168.1.100