22.2. MIME and Email Format Handling
Python supplies the email package to handle parsing, generation, and manipulation of MIME files such as email messages, network news posts, and so on. The Python standard library also contains other modules that handle some parts of these jobs. However, the email package offers a complete and systematic approach to these important tasks. I suggest you use package email, not the older modules that partially overlap with parts of email's functionality. Package email has nothing to do with receiving or sending email; for such tasks, see modules poplib and smtplib, covered in "Email Protocols" on page 503. email deals with handling messages after you receive them or before you send them.
22.2.1. Functions in Package email
Package email supplies two factory functions
that return an instance m of class email.Message.Message. These functions rely on class email.Parser.Parser, but the factory functions are handier and simpler. Therefore, I do not cover module Parser further in this book.
Builds m by parsing string s.
Builds m by parsing the contents of file-like object f, which must be open for reading.
22.2.2. The email.Message Module
The email.Message module supplies class Message. All parts of package email make, modify, or use instances of class Message. An instance m of Message models a MIME message, including headers and a payload (data content). To create an initially empty m, call class Message with no arguments. More often, you create m by parsing via functions message_from_string and message_from_file of module email, or by other indirect means such as the classes covered in "Creating Messages" on page 568. m's payload can be a string, a single other instance of Message, or a list of other Message instances for a multipart message.
You can set arbitrary headers on email messages you're building. Several Internet RFCs specify headers for a wide variety of purposes. The main applicable RFC is RFC 2822 (see http://www.faqs.org/rfcs/rfc2822.html). An instance m of class Message holds headers as well as a payload. m is a mapping, with header names as keys and header value strings as values. To make m more convenient, the semantics of m as a mapping are different from those of a dictionary. m's keys are case-insensitive. m keeps headers in the order in which you add them, and methods keys, values, and items return headers in that order. m can have more than one header named key: m[key] returns an arbitrary header, and del m[key] deletes all of them. len(m) returns the total number of headers, counting duplicates, not just the number of distinct header names. If there is no header named key, m[key] returns None and does not raise KeyError (i.e., behaves like m.get(key)), and del m[key] is a no-operation in this case. You cannot loop directly on m; loop on m.keys( ) instead.
An instance m of Message supplies the following attributes and methods that deal with m's headers and payload.
Like m[_name]=_value, but you can also supply header parameters as named arguments. For each named argument pname=pvalue, add_header changes underscores to dashes, then appends to the header's value a parameter of the form:
If pvalue is None, add_header appends only a parameter '; pname'.
Returns the entire message as a string (the message's payload must be a string). When unixfrom is true, also includes a first line, normally starting with 'From ', known as the envelope header of the message.
Adds the payload to m's payload. If m's payload was None, m's payload is now the single-item list [payload]. If m's payload was a list, appends payload to the list. Otherwise, m.attach(payload) raises MultipartConversionError.
Attribute m.epilogue can be None or a string that becomes part of the message's string-form after the last boundary line. Mail programs normally don't display this text. epilogue is a normal attribute of m: your program can access it when you're handling an m built by whatever means, and bind it when you're building or modifying m.
Returns a list with all values of headers named name in the order in which the headers were added to m. When m has no header named name, get_all returns default.
Returns the string value of the boundary parameter of m's Content-Type header. When m has no Content-Type header, or the header has no boundary parameter, get_boundary returns default.
Returns the list L of string values of parameter charset of m's Content-Type headers. When m is multipart, L has one item per part; otherwise, L has length 1. For parts that have no Content-Type, no charset parameter, or a main type different from 'text', the corresponding item in L is default.
Returns m's main content type: a lowercased string 'maintype' taken from header Content-Type. When m has no header Content-Type, get_content_maintype returns default.
Returns m's content subtype: a lowercased string 'subtype' taken from header Content-Type. When m has no header Content-Type, get_content_subtype returns default.
Returns m's content type: a lowercased string 'maintype/subtype' taken from header Content-Type. When m has no header Content-Type, get_content_type returns default.
Returns the string value of the filename parameter of m's Content-Disposition header. When m has no Content-Disposition, or the header has no filename parameter, get_filename returns default.
Returns the string value of parameter param of m's header header. Returns the empty string for a parameter specified just by name. When m has no header header, or the header has no parameter named param, get_param returns default.
Returns the parameters of m's header header, a list of pairs of strings that give each parameter's name and value. Uses the empty string as the value for parameters specified just by name. When m has no header header, get_params returns default.
Returns m's payload. When m.is_multipart( ) is False, i must be None, and m.get_payload( ) returns m's entire payload, a string or Message instance. If decode is true, and the value of header Content-Transfer-Encoding is either 'quoted-printable' or 'base64', m.get_payload also decodes the payload. If decode is false, or header Content-Transfer-Encoding is missing or has other values, m.get_payload returns the payload unchanged.
When m.is_multipart( ) is true, decode must be false. When i is None, m.get_payload( ) returns m's payload as a list. Otherwise, m.get_payload( ) returns the ith item of the payload, or raises TypeError if i<0 or i is too large.
Returns the envelope header string for m, or None if m has no envelope header.
Returns true when m's payload is a list; otherwise, False.
Attribute m.preamble can be None or a string that becomes part of the message's string form before the first boundary line. A mail program shows this text only if it doesn't support multipart messages, so you can use this attribute to alert the user that your message is multipart and a different mail program is needed to view it. preamble is a normal attribute of m: your program can access it when you're handling an m that is built by whatever means and bind it when you're building or modifying m.
Sets the boundary parameter of m's Content-Type header to boundary. When m has no Content-Type header, raises HeaderParseError.
Sets m's payload to payload, which must be a string or list, as appropriate.
Sets the envelope header string for m. unixfrom is the entire envelope header line, including the leading 'From ' but not including the trailing '\n'.
Returns an iterator on all parts and subparts of m to walk the tree of parts depth-first.
22.2.3. The email.Generator Module
The email.Generator module supplies class Generator, which you can use to generate the textual form of a message m. m.as_string and str(m) may be sufficient, but class Generator gives you more flexibility. You instantiate Generator with a mandatory argument and two optional arguments.
outfp is a file or file-like object that supplies method write. When mangle_from_ is true, g prepends '>' to any line in the payload that starts with 'From ' to make the message's textual form easier to parse. g wraps each header line at semicolons into physical lines of no more than maxheaderlen characters. To use g, call g.flatten:
This emits m as text to outfp, like outfp.write(m.as_string(unixfrom)).
22.2.4. Creating Messages
Package email supplies modules with names that start with 'MIME', each module supplying a subclass of Message named like the module. These classes make it easier to create Message instances of various MIME types. The MIME classes are as follows.
class MIMEAudio(_audiodata,_subtype=None,_encoder=None, **_params)
_audiodata is a byte string of audio data to pack in a message of MIME type 'audio/_subtype'. When _subtype is None, _audiodata must be parseable by standard Python module sndhdr to determine the subtype; otherwise, MIMEAudio raises a TypeError. When _encoder is None, MIMEAudio encodes data as Base 64, which is generally optimal. Otherwise, _encoder must be callable with one parameter m, which is the message being constructed; _encoder must then call m.get_payload( ) to get the payload, encode the payload, put the encoded form back by calling m.set_payload, and set m['Content-Transfer-Encoding'] appropriately. MIMEAudio passes the _params dictionary of named-argument names and values to m.add_header to construct m's Content-Type.
The base class of all MIME classes; directly subclasses Message. Instantiating:
m = MIMEBase(main,sub,**
is equivalent to the longer and less convenient idiom:
m = Message( )
Like MIMEAudio, but with main type 'image'; uses standard Python module imghdr to determine the subtype, if needed.
Packs msg, which must be an instance of Message (or a subclass), as the payload of a message of MIME type 'message/_subtype'.
class MIMEText(_text,_subtype='plain',_charset='us-ascii', _encoder=None)
Packs text string _text as the payload of a message of MIME type 'text/_subtype' with the given charset. When _encoder is None, MIMEText does not encode the text, which is generally optimal. Otherwise, _encoder must be callable with one parameter m, which is the message being constructed; _encoder must then call m.get_payload( ) to get the payload, encode the payload, put the encoded form back by calling m.set_payload, and set m['Content-Transfer-Encoding'] appropriately.
22.2.5. The email.Encoders Module
The email.Encoders module supplies functions that take a message m as their only argument, encode m's payload, and set m's headers appropriately.
Uses Base 64 encoding, optimal for arbitrary binary data.
Does nothing to m's payload and headers.
Uses Quoted Printable encoding, optimal for text that is almost but not fully ASCII.
Does nothing to m's payload, and sets header Content-Transfer-Encoding to '8bit' if any byte of m's payload has the high bit set; otherwise, to '7bit'.
22.2.6. The email.Utils Module
The email.Utils module supplies several functions useful for email processing.
pair is a pair of strings (realname,email_address). formataddr returns a string s with the address to insert in header fields such as To and Cc. When name is false (e.g., ''), dump_address_pair returns email_address.
timeval is a number of seconds since the epoch. When timeval is None, formatdate uses the current time. When localtime is true, formatdate uses the local time zone; otherwise, it uses UTC. formatdate returns a string with the given time instant formatted in the way specified by RFC 2822.
Parses each item of L, a list of address strings as used in header fields such as To and Cc, and returns a list of pairs of strings (name,email_address). When getaddresses cannot parse an item of L as an address, getaddresses uses (None,None) as the corresponding item in the list it returns.
t is a tuple with 10 items. The first nine items of t are in the same format used in module time, covered in "The time Module" on page 302. t[-1] is a time zone as an offset in seconds from UTC (with the opposite sign from time.timezone, as specified by RFC 2822). When t[-1] is None, mktime_tz uses the local time zone. mktime_tz returns a float with the number of seconds since the epoch, in UTC, corresponding to the instant that t denotes.
Parses string s, which contains an address as typically specified in header fields such as To and Cc, and returns a pair of strings (realname,email_address). When parseaddr cannot parse s as an address, parseaddr returns ('','').
Parses string s as per the rules in RFC 2822 and returns a tuple t with nine items, as used in module time, covered in "The time Module" on page 302 (the items t[-3:] are not meaningful). parsedate also attempts to parse some erroneous variations on RFC 2822 that widespread mailers use. When parsedate cannot parse s, parsedate returns None.
Like parsedate, but returns a tuple t with 10 items, where t[-1] is s's time zone as an offset in seconds from UTC (with the opposite sign from time.timezone, as specified by RFC 2822), like in the argument that mktime_tz accepts. Items t[-4:-1] are not meaningful. When s has no time zone, t[-1] is None.
Returns a copy of string s, where each double quote (") becomes '\"' and each existing backslash is repeated.
Returns a copy of string s where leading and trailing double-quote characters (") and angle brackets (<>) are removed if they surround the rest of s.
22.2.7. Example Uses of the email Package
The email package helps you both in reading and composing email and email-like messages (the email package, on the other hand, has absolutely nothing to do with receiving and transmitting such messages: these tasks belong to the completely different and separate modules covered in Chapters 19 and 20). Here is an example of how to use the email package to read a possibly multipart message and unpack each part into a file in a given directory:
import os, email def unpack_mail(mail_file, dest_dir):
''' Given file object mail_file, open for reading, and dest_dir, a
string that is a path to an existing, writable directory, unpack
each part of the mail message from mail_file to a file within
try: msg = email.message_from_file(mail_file)
finally: mail_file.close( )
for part_number, part in enumerate(msg.walk( )):
if part.get_content_maintype( ) == "multipart":
dest = part.get_filename( )
if dest is None: dest = part.get_param("name")
if dest is None: dest = "part-%i" % partCounter
# In real life, make sure that dest is a reasonable filename
# for your OS; otherwise, mangle that name until it is!
f = open(os.path.join(dest_dir, dest), "wb")
finally: f.close( )
And here is an example that performs roughly the reverse task, packaging all files that are directly under a given source directory into a file suitable for mailing:
def pack_mail(source_dir, **headers):
''' Given source_dir, a string that is a path to an existing, readable
directory, and arbitrary header name/value pairs passed in as named
arguments, packs all the files directly under source_dir (assumed to
be plain text files) into a mail message returned as a string.
msg = email.Message.Message( )
for name, value in headers.iteritems( ):
msg[name] = value
msg['Content-type'] = 'multipart/mixed'
filenames = os.walk(source_dir).next( )[-1]
for filename in filenames:
m = email.Message.Message( )
m.add_header("Content-type", 'text/plain', name=filename)
f = open(os.path.join(source_dir, filename), "r")
return msg.as_string( )
22.2.8. The Message Classes of the rfc822 and mimetools Modules
The best way to handle email-like messages is with package email. However, some other modules covered in Chapters 19 and 21 use instances of class rfc822.Message or its subclass, mimetools.Message. This section covers the subset of these classes' functionality that you need to make effective use of the modules covered in Chapters 19 and 21.
An instance m of class Message is a mapping, with the headers' names as keys and the corresponding header value strings as values. Keys and values are strings, and keys are case-insensitive. m supports all mapping methods except clear, copy, popitem, and update. get and setdefault default to '' instead of None. Instance m also supplies convenience methods (e.g., to combine getting a header's value and parsing it as a date or an address). I suggest you use for such purposes the functions of module email.Utils (covered in "The email.Utils Module" on page 570) and use m just as a mapping.
When m is an instance of mimetools.Message, m supplies additional methods.
Returns m's main content type, taken from header Content-Type converted to lowercase. When m has no header Content-Type, getmaintype returns 'text'.
Returns the string value of the parameter named param of m's header Content-Type.
Returns m's content subtype, taken from header Content-Type converted to lowercase. When m has no Content-Type, getsubtype returns 'plain'.
Returns m's content type, taken from header Content-Type converted to lowercase. When m has no Content-Type, gettype returns 'text/plain'.