Team LiB
Previous Section Next Section

Troubleshooting BIND9 and DNS Issues

If you have problems the first few times you set up DNS, you are not alone. Most people run into trouble at some point, until they've worked enough with DNS to figure out where problems are likely to hide. In this section, we show you some of the most common DNS-related problems and offer some tips on getting your service running again.

Caution 

If you use the Red Hat GUI configuration tool redhat-config-bind, you may run into serious trouble. This tool overwrites your regular files and can make it difficult to diagnose problems. In addition, once you commit to working with the GUI tool, you cannot return to DNS configuration at the command line. (Think of it like switching to synthetic oil in your car.) Other GUI tools may hide crucial data in nonstandard locations or might even fail to parse all the options available in the service. We strongly recommend that you work with BIND9 and DNS services at the command line with a text editor.

Luckily, most DNS problems can be resolved with regular command line programs. Traditionally, the nslookup program has been the primary troubleshooting choice, but the newer program dig has quickly supplanted it. The output from dig provides a great deal of information that can help you fix your DNS server issues quickly and accurately.

For example, you can use dig to "walk" the DNS tree for a given domain, as demonstrated earlier in this chapter. You can also use dig to complete entire zone transfers from a specified name server, which is a very useful tool (or security check). To do so, issue this command:

# dig example.com axfr @10.1.1.1

; <<>> DiG 9.2.2-P3 <<>> example.com axfr @10.1.1.1
;; global options:  printcmd
example.com.          86400   IN      SOA     example.com.
    tom.yahoo.com.example.com. 2004011824 10800 900 604800 86400
example com.          86400   IN      A       10.1.1.1
example.com .         86400   IN      NS      ns.example.com.
example.com.          86400   IN      NS      ns2.example.com. ftp
example.com.          86400   IN      CNAME   example.com.
mail.example.com .    86400   IN      CNAME   example.com.
ns.example.com.       86400   IN      A       10.1.1.)
ns2.example.com.      86400   IN      A       192.168.128.3
webdav.example.com.   86400   IN      CNAME   example.com.
www.example.com.      86400   IN      CNAME   example.com.
example.com.          86400   IN      SOA     example.com.
    tom.yahoo.com.example.com. 2004011824 10800 900 604800 86400
;; Query time: 3 msec
;; SERVER: 10.1.1.1#53(10.1.1.1)
;; WHEN: Mon Jan 19 01:21:40 2004
;; XFR size: 12 records

While this is useful if you are the administrator of example. com, think how much trouble this could cause if someone else was able to suck down all your unsecured reverse DNS records. Handing over a complete zone record, which contains every IP address on the network is not high on our list of Secure Administrative Policies.

Finding DNS problems can be a bit tricky. You need to think about the tools you use, the settings in /etc/named.conf versus settings in zone files or slave server settings, the zone transfer settings you've chosen, and what you're resolving against. Not to mention things like iptables and firewalling rules!

The entries in the /var/log/messages log file can be very helpful in narrowing down possible solutions. The remainder of this section offers solutions to common DNS service problems.

The Slave Name Server is Not Updating Itself

One common DNS problem involves slave name servers. If you change the master example.com.zone file and restart the service, but the slave name server does not also update itself, external DNS requests might fail or receive the wrong information. To solve this question, think about how the master server works.

When a zone file is changed and named restarts, the daemon sends a NOTIFY command that should trigger the slave server to restart itself as well. To see whether your named did this, check your log files:

# tail /var/log/messages
...
Jan 19 02:33:47 localhost named[7665]: zone example.com/IN: loaded serial
    2004011825
Jan 19 02:33:47 localhost named[7665]: zone localhost/IN: loaded serial 42
Jan 19 02:33:47 localhost named[7665]: running
Jan 19 02:33:47 localhost named[7665]: zone example.com/IN: sending
notifies (serial 2004011825)

If the NOTIFY command was executed properly, the next line in the log should have been

Jan 19 02:20:58 localhost named[7528]: client 192.168.128.3#33301: transfer
    of 'example.com/IN': AXFR -style IXFR started

Since this line did not display, the slave server at 192.168.128.3 did not perform a zone transfer. Thus, there's a problem. Perhaps the slave server can't find the master server or there is another configuration error. Use dig to trace the existing configuration:

# dig ns2.example.com @10.1.1.1
...
;; ANSWER SECTION:
ns2.example.com.    86400 IN   A    192.168.128.3

;; AUTHORITY SECTION:
example.com.        86400 IN   NS   ns.example.com.
example.com.        86400 IN   NS   ns2.example.com.example.com.

The A record is fine, but note the oddness in the NS records. Why is there a double domain error here? Open your zone file, and you'll see the problem:

   2004011825   ;
                          3H              ; refresh
                          15M             ; retry
                          1W              ; expiry
                          1D )            ; minimum
   @            1D IN NS     ns.example.com.
                1D IN NS     ns2.example.com

There it is, on the last line: rather, there it isn't. Remember that you need to supply a trailing clot for every domain name. Since this entry doesn't have a trailing dot, the NS record is broken and your slave server can't update. Simply add the dot, save the file, and restart the service again.

Using whois Effectively

With the explosion of domain name registrars across the world, the simple whois command isn't as immediately helpful as it used to be. For general use, whois is used with this syntax:

   # whois domain-name

as in

   # whois wiley.com
   Domain Name: WILEY.COM
      Registrar: REGISTER.COM, INC.
      Whois Server: whois.register.com
      Referral URL: http://www.register.com
      Name Server: JWS-EDCP.WILEY.COM
      Name Server: NS1.WILEYPUB.COM
      Status: ACTIVE
      Updated Date: 21-nov-2003
      Creation Date: 12-oct-1994
      Expiration Date: 11-oct-2011

However, simple whois is reliable only for domain names in the . com,.net, and .edu TLD. To get a more accurate report of domain ownership, issue whois against a specific name server. the following code block shows the command issued with three widely used whois servers:

   # whois domain-name@whois.internic.net
   # whois domain-name@whois.register.com
   # whois domain-name@whois.geektools.com

If you can't get the result you need from one of these servers, and you're looking for a site in a different TLD, find the whois server for that domain's registrar of record. If you query that whois server, you should get the information you seek.

A New Alias or Address Record Won't Load

If your DNS server is up and running, everything may seem to be fine. However, if you add a new alias or address record at a later point, you may find that it won't load, no matter how many times you reload the zone files. Everything may seem to be in order, but clearly there is a problem. Bryan Bailey, a Rackspace Linux support sysadmin and RHCE, suggests the following approach.

Zone files are quite prone to user error. Think about the unusual syntax of entries in this file. You must use this syntax exactly when you add a new record, or the record will not load. As in the previous example, the trailing dot is the most common zone file omission.

The zone file shown here contains a CNAME record that will not load because it has a missing dot:

   $TTL 38400
   foo.com.    IN      SOA    ns.foo.com. hostmaster.foo.com.(
                     2003123166
                     10800
                     3600
                     604800
                     38400 )

   foo.com.          IN       NS     ns.foo.com.
   foo.com.          IN       A      192.168.0.1
   www               IN       CNAME  foo.com.
   mail              IN       CNAME  foo.com.
   pop3              IN       CNAME  foo.com.
   smtp              IN       CNAME  foo.com.
   ftp               IN       CNAME  foo.com.
   mysubdomain       IN       CNAME  foo.com

While named will reload the zone file without error, the added entry will never resolve. Instead, this entry will create the FQDN mysubdomain. foo. com.foo.com.

To fix this problem, just open the file in a text editor and add the dot to the final entry. Save the file, exit, and restart the service. Try to make a habit of checking the last line in every zone record to ensure that the trailing dot is there. For some reason, it's the final line that always seems to be the culprit.

Note 

Always remember to increment the serial number in the zone file, whether you are adding a new record or fixing a problem. Compare this corrected version of the zone file to the problematic version shown previously:

   $TTL 38400
   foo.com.    IN   SOA    ns.foo.com.
   hostmaster.foo.com.(
                 2003123167
                    10800
                    3600
                    604800
                    38400 )

   foo.com.       IN     NS      ns.foo.com.
   foo.com.       IN     A       192.168.0.1
   www            IN     CNAME   foo.com.
   mail           IN     CNAME   foo.com.
   pop3           IN     CNAMF   foo.com.
   smtp           IN     CNAME   foo.com.
   ftp            IN     CNAME   foo.com.
   mysubdomain    IN     CNAME   foo.com.

Note that the serial number has been incremented by one, and the final trailing dot added. The zone file should now work properly.

Automated DNS Zone File Troubleshooting

Allen Rouse, a Rackspace Linux support sysadmin who does a lot of DNS troubleshooting, offers the following tip for easy automation.

Problems like the missing trailing dot example as well as malformed PTR records, bad SOAs, and many other zone file abnormalities and typos can be a real pain to track down. Rackspace sysadmins often use a special tool to automatically scan for and detect these zone file problems after making a zone file change but before restarting the customer's name server (to keep it from crashing on a bad zone file).

Try the DNS administrator tool dlint (www.domtools.com/dns/dlint.shtml). It's worth its weight in gold for the busy DNS administrator.

Troubleshooting Tools

Sometimes administrators can't solve a DNS problem simply because they don't know where to find the right tool. There are a number of useful DNS troubleshooting tools on the web. If you can't find the answer on one of these sites, you should at least be able to find links to other resources that might solve your problem:


Team LiB
Previous Section Next Section