Previous Page
Next Page

24.4. Changing and Generating XML

Just like for HTML and other kinds of structured text, the simplest way to output an XML document is often to prepare and write it using Python's normal string and file operations, covered in Chapter 9 and "File Objects" on page 216. Templating (covered in "Templating" on page 586) is also often the best approach. Subclassing class XMLGenerator (covered in "XMLGenerator" on page 597) is a good way to generate an XML document that is like an input XML document except for a few changes.

The xml.dom.minidom module offers yet another possibility because its classes support methods to generate, insert, remove, and alter nodes in a DOM tree that represents the document. You can create a DOM tree by parsing and then alter it, or you can create an empty DOM tree and populate it from scratch. You can output the resulting XML document with methods toxml, toprettyxml, or writexml of the Document instance. You can also output a subtree by calling these methods on the Node that is the subtree's root. The ElementTree module, mentioned in this chapter's introduction, also offers similar functionality (but with a more Pythonic API and much better performance).

24.4.1. Factory Methods of a Document Object

The Document class supplies factory methods to create instances of Node subclasses. The most frequently used factory methods of a Document instance d are as follows.

createComment

d.createComment(data)

Builds and returns an instance c of class Comment for a comment with text data.

createElement

d.createElement(tagname)

Builds and returns an instance e of class Element for an element with the given tag.

createTextNode

d.createTextNode(data)

Builds and returns an instance t of class TextNode for a text node with text data.


24.4.2. Mutating Methods of an Element Object

An instance e of class Element supplies methods to remove and add attributes.

removeAttribute

e.removeAttribute(name)

Removes e's attribute with the given name.

setAttribute

e.setAttribute(name,value)

Changes e's attribute with the given name to have the given value, or adds to e a new attribute with the given name and value if e had no attribute named name.


24.4.3. Mutating Methods of a Node Object

An instance n of class Node supplies methods to remove, add, and replace children.

appendChild

n.appendChild(child)

Makes child the last child of n, whatever child's parent was (including n or None).

insertBefore

n.insertBefore(child,nextChild)

Makes child the child of n immediately before nextChild, whatever child's parent was (including n or None). nextChild must be a child of n.

removeChild

n.removeChild(child)

Makes child parentless and returns child. child must be a child of n.

replaceChild

n.replaceChild(child,oldChild)

Makes child the child of n in oldChild's place, whatever child's parent was (including n or None). oldChild must be a child of n. Returns oldChild.


24.4.4. Output Methods of a Node Object

An instance n of class Node supplies methods to output the subtree rooted at n.

toprettyxml

n.toprettyxml(indent='\t',newl='\n')

Returns a string, plain or Unicode, with the XML source for the subtree rooted at n, using indent to indent nested tags and newl to end lines.

toxml

n.toxml( )

Like n.toprettyxml('',''), i.e., inserts no extraneous whitespace.

writexml

n.writexml(file,encoding='None')

Writes the XML source for the subtree rooted at n to file-like object file, open for writing, using the specified encoding. If encoding is not given, then file.write must accept a unicode argument.


24.4.5. Changing and Outputting XHTML with xml.dom.minidom

The following example uses xml.dom.minidom to analyze an XHTML page and output it to standard output with each hyperlink's destination URL shown, within triple parentheses, just before the hyperlink:

import xml.dom.minidom, urllib, sys

f = urllib.urlopen('http://www.w3.org/MarkUp/')
doc = xml.dom.minidom.parse(f)
as = doc.getElementsByTagName('a')
for a in as:
    value = a.getAttribute('href')
    if value:
        newtext = doc.createTextNode(' (((%s)))'%value)
        a.parentNode.insertBefore(newtext,a)

doc.writexml(sys.stdout, 'utf-8')

This example uses encoding 'utf-8' because that is the encoding that the XML standard specifies as the default, but you may want to change this detail, depending on the encoding your terminal window supports.


Previous Page
Next Page