Chapter 24. Structured Text: XML
XML, the eXtensible Markup Language, has become very widespread over the last few years. Like SGML (mentioned in "The sgmllib Module" on page 576), XML is a metalanguage, a language to describe markup languages. On top of XML 1.0, the XML community (mostly within the World Wide Web Consortium [W3C]) has standardized many other technologies, such as schema languages, Namespaces, XPath, XLink, XPointer, and XSLT.
Industry consortia in many fields have defined industry-specific markup languages on top of XML to facilitate data exchange among applications in those fields. Such industry standards let applications exchange data even when the applications are coded in different languages and deployed on different platforms by different firms. XML, related technologies, and XML-based markup languages are the basis for inter-application, cross-language, cross-platform data interchange in modern applications.
Python has excellent support for XML. The standard Python library supplies the xml package, which lets you use fundamental XML technology quite simply. The third-party package PyXML (http://pyxml.sf.net) extends the standard library's xml with validating parsers, richer DOM implementations, and advanced technologies such as XPath and XSLT. Downloading and installing PyXML upgrades Python's own xml packages, so it can be a good idea to do so even if you don't use PyXML-specific features.
On top of PyXML, you can choose to install yet another freely available third-party package, 4Suite (available at http://4suite.org). 4Suite provides even more XML parsers for special niches, advanced technologies such as XLink and XPointer, and code supporting standards built on top of XML, such as the Resource Description Framework (RDF).
A highly Pythonic alternative for XML processing is ElementTree (http://effbot.org/zone/element-index.htm), most of whose functionality is also slated for release in Python 2.5 as standard library module xml.etree (for use in Python 2.3 or 2.4, or even 2.5 but with more complete functionality, you can, in any case, download and install the complete ElementTree from the URL at effbot.org). ElementTree's elegance, speed, and highly Pythonic architecture make it the package of choice for most Python XML applications, particularly, but not exclusively, if you can restrict your application to run on Python 2.5. However, I do not cover ElementTree in this book.
In this chapter, I cover only the essentials of the standard library's xml package, taking for granted some elementary knowledge of XML itself.