I l@ve RuBoard Previous Section Next Section

15.7 XML Processing Tools

Python ships with XML parsing support in its standard library and plays host to a vigorous XML special-interest group. XML (eXtended Markup Language) is a tag-based markup language for describing many kinds of structured data. Among other things, it has been adopted in roles such as a standard database and Internet content representation by many companies. As an object-oriented scripting language, Python mixes remarkably well with XML's core notion of structured document interchange, and promises to be a major player in the XML arena.

XML is based upon a tag syntax familiar to web page writers. Python's xmllib library module includes tools for parsing XML. In short, this XML parser is used by defining a subclass of an XMLParser Python class, with methods that serve as callbacks to be invoked as various XML structures are detected. Text analysis is largely automated by the library module. This module's source code, file xmllib.py in the Python library, includes self-test code near the bottom that gives additional usage details. Python also ships with a standard HTML parser, htmllib, that works on similar principles and is based upon the sgmllib SGML parser module.

Unfortunately, Python's XML support is still evolving, and describing it is well beyond the scope of this book. Rather than going into further details here, I will instead point you to sources for more information:

Standard library

First off, be sure to consult the Python library manual for more on the standard library's XML support tools. At the moment, this includes only the xmllib parser module, but may expand over time.

PyXML SIG package

At this writing, the best place to find Python XML tools and documentation is at the XML SIG (Special Interest Group) web page at http://www.python.org (click on the "SIGs" link near the top). This SIG is dedicated to wedding XML technologies with Python, and publishes a free XML tools package distribution called PyXML. That package contains tools not yet part of the standard Python distribution, including XML parsers implemented in both C and Python, a Python implementation of SAX and DOM (the XML Document Object Model), a Python interface to the Expat parser, sample code, documentation, and a test suite.

Third-party tools

You can also find free, third-party Python support tools for XML on the Web by following links at the XML SIGs web page. These include a DOM implementation for CORBA environments (4DOM) that currently supports two ORBs (ILU and Fnorb) and much more.

Documentation

As I wrote these words, a book dedicated to XML processing with Python was on the eve of its publication; check the books list at http://www.python.org or your favorite book outlet for details.

Given the rapid evolution of XML technology, I wouldn't wager on any of these resources being up to date a few years after this edition's release, so be sure to check Python's web site for more recent developments on this front.

In fact, the XML story changed substantially between the time I wrote this section and when I finally submitted it to O'Reilly. In Python 2.0, some of the tools described here as the PyXML SIG package have made their way into a standard xml module package in the Python library. In other words, they ship and install with Python itself; see the Python 2.0 library manual for more details. O'Reilly has a book in the works on this topic called Python and XML.

    I l@ve RuBoard Previous Section Next Section