24.4. Changing and Generating XML
Just like for HTML and other kinds of structured text, the simplest way to output an XML document is often to prepare and write it using Python's normal string and file operations, covered in Chapter 9 and "File Objects" on page 216. Templating (covered in "Templating" on page 586) is also often the best approach. Subclassing class XMLGenerator (covered in "XMLGenerator" on page 597) is a good way to generate an XML document that is like an input XML document except for a few changes.
The xml.dom.minidom module offers yet another possibility because its classes support methods to generate, insert, remove, and alter nodes in a DOM tree that represents the document. You can create a DOM tree by parsing and then alter it, or you can create an empty DOM tree and populate it from scratch. You can output the resulting XML document with methods toxml, toprettyxml, or writexml of the Document instance. You can also output a subtree by calling these methods on the Node that is the subtree's root. The ElementTree module, mentioned in this chapter's introduction, also offers similar functionality (but with a more Pythonic API and much better performance).
24.4.1. Factory Methods of a Document Object
The Document class supplies factory methods to create instances of Node subclasses. The most frequently used factory methods of a Document instance d are as follows.
24.4.2. Mutating Methods of an Element Object
An instance e of class Element supplies methods to remove and add attributes.
24.4.3. Mutating Methods of a Node Object
An instance n of class Node supplies methods to remove, add, and replace children.
24.4.4. Output Methods of a Node Object
An instance n of class Node supplies methods to output the subtree rooted at n.
24.4.5. Changing and Outputting XHTML with xml.dom.minidom
The following example uses xml.dom.minidom to analyze an XHTML page and output it to standard output with each hyperlink's destination URL shown, within triple parentheses, just before the hyperlink:
import xml.dom.minidom, urllib, sys f = urllib.urlopen('http://www.w3.org/MarkUp/') doc = xml.dom.minidom.parse(f) as = doc.getElementsByTagName('a') for a in as: value = a.getAttribute('href') if value: newtext = doc.createTextNode(' (((%s)))'%value) a.parentNode.insertBefore(newtext,a) doc.writexml(sys.stdout, 'utf-8')
This example uses encoding 'utf-8' because that is the encoding that the XML standard specifies as the default, but you may want to change this detail, depending on the encoding your terminal window supports.