The MIME Protocol

For those of you who haven't worked with email protocols before, chances are you have heard the term MIME a time or two but don't know too much about it. This section is designed to educate you on what the MIME protocol is and how it works so that it can be implemented into a PHP script. Although technically unnecessary, it is strongly recommended that this section be reviewed before moving on to the actual PHP script.

The MIME (or multipurpose Internet mail extensions) is an addition to the standard email protocol used to send a simple text message. The basic premise behind the MIME protocol is to provide a means to both separate and group multiple types of content within an email message in a standardized fashion. Using this protocol, each individual "segment" of an email can be of a different MIME type (such as image/jpeg, text/plain, and so on) and even a different encoding (7bit, base64, for example).

As an example, Listing 16.1 is a very simple MIME-based email:

Listing 16.1. A Simple Email Using the MIME Protocol

From: "John Coggeshall" <john@php.net>
To: "Angie Sue" <angiesue@example.com>
Subject: Hey there!
Date: Fri, 28 Feb 2003 18:12:32  -0400
Message-ID: <somewhere@somecomputer.net>
MIME-Version: 1.0
Content-Type: text/plain;
Content-Transfer-Encoding: 7bit;

Hey there Angie Sue! Where are you?

Compare this to a standard email message that is not implementing the MIME protocol; the only difference between the two is the last three headers in Listing 16.1:

MIME-Version, Content-Type, and Content-Transfer-Encoding. These three headers determine the nature of the rest of the email message. For instance, the Content-Type header in Listing 16.1 has been set to text/plain. However, the value for this header can be any valid MIME type, such as image/jpeg (for a JPEG format image), or text/html (for a HTML-formatted message). Of course, for the email to be properly rendered in the email client, the client must understand how to render an image or HTML document.

Although slightly more interesting than a standard email message, a basic MIME email is still pretty dry. Although a basic MIME email enables you to send an email message that contains an image or the like, you are still limited to sending one content type. For the MIME protocol to be of any substantial use to us, we'll need a method of including multiple different content types in a single email. To facilitate this, the MIME protocol provides a set of content types that are all a part of the multipart/* family. Some of the more interesting content types in this family are shown in Table 16.1:

Table 16.1. Interesting multipart/* Content Types
multipart/mixed
Allows multiple different content types
multipart/alternative
Allows for multiple versions of the same content, each with a different content type
multipart/related
Allows for multiple different content types that are somehow related to one another

All content types in the multipart/* family, although different in function, are similar in principal. Specifically, when one of these content types is specified, an additional parameter named boundary must also be specified:

Content-Type: multipart/mixed; boundary=myboundary

This boundary parameter's value is used to determine where one segment of the particular multipart content begins and ends. Specifically, the beginning of each new segment is marked by two dashes followed by the value specified by the boundary parameter in the email message:

--myboundary

Likewise, the end of the multipart segment is denoted by two dashes before and after the value specified by the boundary parameter:

--myboundary--

Because each segment within the multipart content type specifies its own headers (such as content type, encoding, and so on) these content types allow us to send text as well as data (such as an image) in a single email. Listing 16.2 illustrates this by showing an email that contains both a text message and an attachment (an image):

Listing 16.2. A Multipart/Mixed MIME Email Example

From: "John Coggeshall" <john@php.net>
To: "Angie Sue" <angiesue@example.com>
Subject: Here's a neat picture
Date: Sat, 23 Dec 2002 10:21:34 -0400
Message-ID: <somewhere@somecomputer.net>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="abcdefghi";
Content-Transfer-Encoding: 7bit

This is a MIME-based e-mail. If you are reading this message,
Then your e-mail client does not support the MIME protocol.

--abcdefghi
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

Hey there Angie Sue! Check out this neat image I found online.

- John
--abcdefghi
Content-Type: image/jpeg; name="angel.jpg";
Content-Transfer-Encoding: base64
Content-Disposition: attachment

<base64 encoded data for the file 'angel.jpg'>
--abcdefghi--

As you can see at the top of Listing 16.2, the Content-Type header for the main part of the email has been set to multipart/mixed, and a boundary has been created whose value is abcdefghi. When an email client encounters this, it proceeds to read through the content of the email until it encounters the start of a new boundary (specified by --abcdefghi). Everything between the start of the email message and the first boundary is ignored. When a new boundary is found, it treats everything between that boundary and the next one independently and handles the data accordingly. This process continues until the boundary-end marker is met (specified by --abcdefghi--).

In short, this email will be rendered as two different segments. The first segment will have a content type of text/plain and contain a simple email message. There is also a second segment to this email with a content-type of image/jpeg, which represents the base64-encoded data representing a JPEG image. Note that in Listing 16.2 the actual base64 encoded data for the image was omitted so as to not waste space.

A number of things can be learned from this MIME email example. First, you should clearly understand how the multipart/mixed content type can be used to create individual segments within an email by separating each segment by a marker specified by the boundary parameter. Second, you should note that it is absolutely critical when dealing with MIME emails that the parameter specified by the boundary parameter be unique! For instance, what if I had specified the boundary parameter as the following:

Content-Type: multipart/mixed; boundary=John;

Although this boundary works, if I happened to sign my email using two dashes:

... Thanks Angie Sue, I appreciate it.
--John
PS -- Do you have that five bucks you owe me?

When interpreted by the email client, what will happen to my little postscript? Because the actual content of my email message contained the same value as my MIME boundary (--John), chances are that it will get lost in the digital void and not be displayed in my email. Worse yet, such an error may cause the email not to be rendered at all by the email client. Hence, it is incredibly important when constructing MIME emails that the value of each boundary parameter be unique enough so that it will not appear in the content of any given segment.

Now that you have been introduced to attaching files to your emails using the multipart/mixed content type, let's introduce another member of the multipart/* familymultipart/alternative.

The multipart/alternative content type is used when sending multiple different versions of the same content in different formats. For instance, if you wanted to send two copies of the same email (a plain version and an HTML version) this is the content type that you would use. Like all the multipart/* family of MIME types, the multipart/alternative content type is used identically to the multipart/mixed type previously described in detail.

One question that you may be asking yourself is how does the email client determine which version of the content to view? Unfortunately, there is no way to "force" an email client to view the email in a specific format; however, modern email clients are programmed in such a way that the "best" version of the email the client is capable of viewing is used. Hence, if your client can render HTML, chances are it will choose the HTML version of an email over the plain-text version, if given a choice.

HTML email brings us to the third and final member of the multipart/* family of MIME types that I'll be discussing: multipart/related. This content type functions in a similar fashion to the multipart/mixed type previously discussed, with one significant difference. When the multipart/related content type is used, the email client will treat all the individual segments defined within the multipart/related content type as pieces of the same, larger content. A prime example of this is HTML mail. Often when sending an email that uses HTML for formatting, it would be nice to include such things as images, sounds, and so on. The obvious way to accomplish this is to have these components on a Web server where they can be retrieved when the email is opened in the client. However, this relies on a number of uncontrollable factors (whether the email client will fetch the data from a remote server, whether the user is online, and so on). A much more effective method of sending HTML email is to include all the required components in the actual email itself. This is where the multipart/related content type comes in.

The concept behind the multipart/related content type, as I've discussed already, is to group multiple, different (yet related) segments within an email to be used in the rendering of a single document. Because the multipart/related content type is most commonly used when dealing with HTML-formatted email, I'll be discussing this content type in those terms. When dealing with HTML, often there is a need to reference additional files for a certain purpose, such as when including an image in your HTML document:

<IMG SRC="http://path/to/image/myimage.jpg">

As I already noted, this is still allowed when dealing with HTML-formatted email, but it is not recommended. Through the use of the multipart/related content type, you can include the desired image directly into the email and then have the email client automatically use it. This is accomplished by assigning a particular segment a unique identifying string through the use of a new header called Content-ID and then referencing that identifier where you normally would use a URL, as shown in Listing 16.3:

Listing 16.3. Using the Multipart/Related MIME Type

... standard e-mail headers omitted ...
Content-Type: multipart/related; boundary=abcdefghi;
Content-Transfer-Encoding: 7bit
---abcdefghi
Content-Type: text/html
Content-Transfer-Encoding: 7bit

<IMG SRC="cid:myimage">
--abcdefghi
Content-Type: image/jpeg
Content-Transfer-Encoding: base64
Content-ID: myimage

<base64 encoded data for "myimage.jpg">
--abcdefghi--

As you can see in Listing 16.3, this is an HTML-formatted email with an attached image. However, note that this image has been assigned an identifier through the use of the Content-ID header. This content identifier is then used in the HTML portion of the email in place of a standard URL for the SRC attribute by using the URI cid:. When rendered, the email client will automatically use the included image within the formatted HTML email. This technique can be applied to include any resource that normally is accessed via a URL, such as images and sounds. The major benefit to this method is that everything is contained within a single "package."