19.1 Parsing with JAXP and SAX

The first thing you want to do with an XML document is parse it. There are two commonly used approaches to XML parsing: they go by the acronyms SAX and DOM. We'll begin with SAX parsing; DOM parsing is covered later in the chapter.

SAX is the Simple API for XML. SAX is not a parser, but rather a Java API that describes how a parser operates. When parsing an XML document using the SAX API, you define a class that implements various "event" handling methods. As the parser encounters the various element types of the XML document, it invokes the corresponding event-handler methods you've defined. Your methods take whatever actions are required to accomplish the desired task. In the SAX model, the parser converts an XML document into a sequence of Java method calls. The parser doesn't build a parse tree of any kind (although your methods can do this, if you want). SAX parsing is typically quite efficient and is therefore often your best choice for most simple XML processing tasks. SAX-style XML parsing is known as "push parsing" because the parser "pushes" events to your event handler methods. This is in contrast to more traditional "pull parsing" in which your code "pulls" tokens from a parser.

The SAX API was created by David Megginson (http://www.megginson.com/ ) and is now maintained at http://www.saxproject.org. The Java binding of the SAX API consists of the package org.xml.sax and its subpackages. SAX is a de facto standard but has not been standardized by any official body. There are two versions of the SAX API. Version 2 is substantially different from the original Version 1, and is today the most common. We cover Version 2 in this chapter.

SAX is an API, not an implementation. Various XML parsers implement the SAX API, and in order to use SAX you need an underlying parser implementation. This is where JAXP comes in. JAXP is the Java API for XML Parsing, and was added to J2SE in Java 1.4.^[1] JAXP consists of the javax.xml.parsers package, and also javax.xml.transform, which we'll consider later in this chapter. JAXP provides a thin layer on top of SAX (and on top of DOM, which we'll also see later) and standardizes an API for obtaining and using SAX (and DOM) parser objects. The JAXP package includes default parser implementations but allows other parsers to be easily plugged in and configured using system properties.

^[1] Prior to Java 1.4, it was available as a standard extension.

Example 19-1 is a listing of ListServlets.java, a program that uses JAXP and SAX to parse a web application deployment descriptor and list the names of the servlets configured by that file. We'll see servlets and their deployment descriptors in Chapter 20, but until then you just need to know that servlet-based web applications are configured using an XML file named web.xml. This file contains <servlet> tags that define mappings between servlet names and the Java classes that implement them. It also contains <servlet-mapping> tags that map from servlet name to a URL or URL pattern by which the servlet is invoked. The ListServlets program parses a web.xml file and stores the name-to-class and name-to-URL mappings, printing out a summary when it reaches the end of the file. To help you understand the what the example does, here is an excerpt from the web.xml file developed in Chapter 20:

<servlet>
  <servlet-name>Hello</servlet-name>
  <servlet-class>je3.servlet.HelloNet</servlet-class>
</servlet>

<!-- The Counter servlet uses initialization parameters -->
<servlet>
  <servlet-name>Counter</servlet-name>  
  <servlet-class>je3.servlet.Counter</servlet-class>
  <init-param>
    <param-name>countfile</param-name>         <!-- where to save state -->
    <param-value>/tmp/counts.ser</param-value> <!-- adjust for your system-->
  </init-param>
  <init-param>
    <param-name>saveInterval</param-name>      <!-- how often to save -->
    <param-value>30000</param-value>           <!-- every 30 seconds -->
  </init-param>
</servlet>

<servlet-mapping>
  <servlet-name>Hello</servlet-name>
  <url-pattern>/Hello</url-pattern>
</servlet-mapping>

<servlet-mapping>
  <servlet-name>Counter</servlet-name>
  <url-pattern>/Counter</url-pattern>
</servlet-mapping>

<!-- Note the wildcard below: any URL ending in .count invokes Counter -->
<servlet-mapping> 
  <servlet-name>Counter</servlet-name>
  <url-pattern>*.count</url-pattern>
</servlet-mapping>

ListServlets.java includes a main( ) method that uses the JAXP API to obtain a SAX parser instance. It then passes the File to parse, along with an instance of the ListServlets class, to the parser. The parser starts running and invokes the ListServlets instance methods as it encounters XML elements in the file.

ListServlets extends the SAX org.xml.sax.helpers.DefaultHandler class. This superclass provides dummy implementations of all the SAX event-handler methods. The example simply overrides the handlers of interest. The parser calls the startElement( ) method when it reads an XML tag; it calls endElement( ) when it finds a closing tag. characters( ) is invoked when the parser reads a string of plain text with no markup. Finally, the parser calls warning( ), error( ), or fatalError( ) when something goes wrong in the parsing process. The implementations of these methods are written specifically to extract the desired information from a web.xml file and are based on a knowledge of the structure of this type of file.

Note that web.xml files are somewhat unusual in that they don't rely on attributes for any of the XML tags. That is, servlet names are defined by a <servlet-name> tag nested within a <servlet> tag, instead of simply using a name attribute of the <servlet> tag itself. This fact makes the example program more complex than it would otherwise be. The web.xml file does allow id attributes for all its tags. Although servlet engines are not expected to use these attributes, they may be useful to a configuration tool that parses and automatically generates web.xml files. In order to demonstrate how to work with attributes in SAX, the startElement( ) method in Example 19-1 looks for an id attribute of the <servlet> tag. The value of that attribute, if it exists, is reported in the program's output.

To run this program, specify the path to a web.xml file on the command line. You can use the one included with the servlets examples, which is at je3/servlet/WEB-INF/web.xml.

Example 19-1. ListServlets.java

package je3.xml;
import javax.xml.parsers.*;       // JAXP classes for obtaining a SAX Parser
import org.xml.sax.*;             // The main SAX package
import org.xml.sax.helpers.*;     // SAX helper classes
import java.io.*;                 // For reading the input file
import java.util.*;               // Hashtable, lists, and so on

/**
 * Parse a web.xml file using the SAX2 API.
 * This class extends DefaultHandler so that instances can serve as SAX2
 * event handlers, and can be notified by the parser of parsing events.
 * We simply override the methods that receive events we're interested in
 **/
public class ListServlets extends org.xml.sax.helpers.DefaultHandler {
    /** The main method sets things up for parsing */
    public static void main(String[  ] args) 
        throws IOException, SAXException, ParserConfigurationException
    {
        // We use a SAXParserFactory to obtain a SAXParser, which
        // encapsulates a SAXReader.
        SAXParserFactory factory = SAXParserFactory.newInstance( );
        factory.setValidating(false);     // We don't want validation
        factory.setNamespaceAware(false); // No namespaces please
        // Create a SAXParser object from the factory
        SAXParser parser = factory.newSAXParser( );
        // Now parse the file specified on the command line using
        // an instance of this class to handle the parser callbacks
        parser.parse(new File(args[0]), new ListServlets( ));
    }

    HashMap nameToClass;     // Map from servlet name to servlet class name
    HashMap nameToID;        // Map from servlet name to id attribute
    HashMap nameToPatterns;  // Map from servlet name to url patterns

    StringBuffer accumulator;                         // Accumulate text
    String servletName, servletClass, servletPattern; // Remember text
    String servletID;        // Value of id attribute of <servlet> tag

    // Called at the beginning of parsing.  We use it as an init( ) method
    public void startDocument( ) {
        accumulator = new StringBuffer( );
        nameToClass = new HashMap( );
        nameToID = new HashMap( );
        nameToPatterns = new HashMap( );
    }

    // When the parser encounters plain text (not XML elements), it calls
    // this method, which accumulates them in a string buffer.
    // Note that this method may be called multiple times, even with no
    // intervening elements.
    public void characters(char[  ] buffer, int start, int length) {
        accumulator.append(buffer, start, length);
    }

    // At the beginning of each new element, erase any accumulated text.
    public void startElement(String namespaceURL, String localName,
                             String qname, Attributes attributes) {
        accumulator.setLength(0);
        // If it's a servlet tag, look for id attribute
        if (qname.equals("servlet")) servletID = attributes.getValue("id");
    }

    // Take special action when we reach the end of selected elements.
    // Although we don't use a validating parser, this method does assume
    // that the web.xml file we're parsing is valid.
    public void endElement(String namespaceURL, String localName, String qname)
    {
        // Since we've indicated that we don't want name-space aware 
        // parsing, the element name is in qname.  If we were doing
        // namespaces, then qname would include the name, colon and prefix, 
        // and localName would be the name without the the prefix or colon.
        if (qname.equals("servlet-name")) {        // Store servlet name
            servletName = accumulator.toString( ).trim( );
        }
        else if (qname.equals("servlet-class")) {  // Store servlet class
            servletClass = accumulator.toString( ).trim( );
        }
        else if (qname.equals("url-pattern")) {    // Store servlet pattern
            servletPattern = accumulator.toString( ).trim( );
        }
        else if (qname.equals("servlet")) {        // Map name to class
            nameToClass.put(servletName, servletClass);
            nameToID.put(servletName, servletID);
        }
        else if (qname.equals("servlet-mapping")) {// Map name to pattern
            List patterns = (List)nameToPatterns.get(servletName);
            if (patterns == null) {
                patterns = new ArrayList( );
                nameToPatterns.put(servletName, patterns);
            }
            patterns.add(servletPattern);
        }
    }

    // Called at the end of parsing.  Used here to print our results.
    public void endDocument( ) {
        // Note the powerful uses of the Collections framework.  In two lines
        // we get the key objects of a Map as a Set, convert them to a List,
        // and sort that List alphabetically.
        List servletNames = new ArrayList(nameToClass.keySet( ));
        Collections.sort(servletNames);
        // Loop through servlet names
        for(Iterator iterator = servletNames.iterator( ); iterator.hasNext( );) {
            String name = (String)iterator.next( );
            // For each name get class and URL patterns and print them.
            String classname = (String)nameToClass.get(name);
            String id = (String)nameToID.get(name);
            List patterns = (List)nameToPatterns.get(name);
            System.out.println("Servlet: " + name);
            System.out.println("Class: " + classname);
            if (id != null) System.out.println("ID: " + id);
            if (patterns != null) {
                System.out.println("Patterns:");
                for(Iterator i = patterns.iterator( ); i.hasNext( ); ) {
                    System.out.println("\t" + i.next( ));
                }
            }
            System.out.println( );
        }
    }

    // Issue a warning
    public void warning(SAXParseException exception) {
        System.err.println("WARNING: line " + exception.getLineNumber( ) + ": "+
                           exception.getMessage( ));
    }

    // Report a parsing error
    public void error(SAXParseException exception) {
        System.err.println("ERROR: line " + exception.getLineNumber( ) + ": " +
                           exception.getMessage( ));
    }

    // Report a non-recoverable error and exit
    public void fatalError(SAXParseException exception) throws SAXException {
        System.err.println("FATAL: line " + exception.getLineNumber( ) + ": " +
                           exception.getMessage( ));
        throw(exception);
    }
}

[ Team LiB ]