6.10. Unit Testing
We've already talked about unit testing and the problems inherent in testing web applications. Our conclusions were that testing regular web interaction in an automated manner is very difficult, but that standalone processes with fixed inputs and expected outputs are ripe for creating a test suite, especially when the rules of the system are complex.
We fit that description perfectly when it comes to processing email. We have some known inputs (mail people have sent us) and expected outputs (specific text and attachments to be extracted). By putting these details into a test harness, we can have a suite of tests to run against our system any time we make a change, which can help eliminate regressing bugs while fixing new ones or adding features.
The first step in setting up a regression test suite is to gather the inputs and list their desired outputs. Since we don't know all of the input we'll be getting, what we ideally need is some way to gather inputs as we get them for later examination. If you have your application write each mail to disk as it's received, you can find the troublesome email when a problem arises.
Some emails are obviously troublesome and can be dealt with automatically. When you expect an attached file but none is found, you can fire a response back to the user saying so. If you send the user a mail containing an identifier for the stored email, then you can easily find it again if the user reports the behavior as a bug. Flickr sends out emails that look a little like this:
Sorry, we couldn't find any photos attached to your email. If you think this is an error, please reply to this email and tell us about it. Reference code: 20050929-210346-8473
In some cases users won't be able to receive replies, such as when they're using a mobile device that can send but is not configured to receive email. The same applies to processing emails that look as if they worked, but didn't in the manner the user intended, such as including extra files, not finding all the files, or text-encoding issues. For this reason, it's also useful to store the incoming mails in a format that's easily searchable given only the user's identifier. When a user notices something has gone wrong, you can dig through all the email they've sent and find the message in question, assuming you actually received it. At Flickr we use the following naming convention:
For example, when I send an email to firstname.lastname@example.org, it gets logged to the file /2005/09/29/21/hello29world-20050929-210346-8473.email. When I report an error saying I sent the mail on September 29, the programmer can search though the files in /2005/09/29 for anything matching hello29world-*.email.
Once we've started capturing inputs, we can create a list of test cases, taking these known inputs and documenting the expected outputs. By creating a script to feed each of our inputs through the system and compare the actual output against the expected output, we can check that all inputs produce expected outputs.
When we come across a new parsing issue, we can immediately enter the input and expected output into the test harness and then use test-driven development to fix the bug. Once the input is creating the expected output, we rerun the whole suite of tests to check that we didn't break anything while fixing the latest issue.
As time goes by, your test suite will grow in size, covering all of the odd formats and corner cases you've come across. The Flickr email test suite covers a few hundred mails, each of which created incorrect behavior in the parser at some point. By running these tests every time we touch the parsing code, we can guarantee we don't create new bugs or recreate old ones.