Previous Page
Next Page

18.1. Testing

In this chapter, I distinguish between two rather different kinds of testing: unit testing and system testing. Testing is a rich and important field, and many more distinctions could be drawn, but I focus on the issues of most immediate importance to software developers. Many developers are reluctant to spend time on testing, seeing it as time subtracted from "real" development, but this attitude is short-sighted: defects are easier to fix the earlier you find out about them, so each hour spent developing tests can amply pay back for itself by finding defects ASAP, thus saving many hours of debugging that would otherwise have been needed in later phases of the software development cycle.

18.1.1. Unit Testing and System Testing

Unit testing means writing and running tests to exercise a single module or an even smaller unit, such as a class or function. System testing (also known as functional or integration testing) involves running an entire program with known inputs. Some classic books on testing also draw the distinction between white-box testing, done with knowledge of a program's internals, and black-box testing, done without such knowledge. This classic viewpoint parallels, but does not exactly duplicate, the modern one of unit versus system testing.

Unit and system testing serve different goals. Unit testing proceeds apace with development; you can and should test each unit as you're developing it. One modern approach is known as test-driven development (TDD): for each feature that your program must have, you first write unit tests and only then do you proceed to write code that implements the feature. TDD may seem upside-down, but it has several advantages. For example, it ensures that you won't omit unit tests for some feature. Further, developing test-first is helpful because it urges you to focus first on the tasks a certain function, class, or method should accomplish, and to deal only afterward with how to implement that function, class, or method. A recent important innovation along the lines of TDD is behavior-driven development; see http://behavior-driven.org/ for all details about this new and promising development methodology.

In order to test a unit, which may depend on other units not yet fully developed, you often have to write stubs, which are fake implementations of various units' interfaces that give known and correct responses in cases needed to test other units. The Mock module (http://python-mock.sourceforge.net/) can be very helpful in the implementation of many such stubs (a good tutorial, with links to other useful documents about Mock Objects, is at http://xper.org/wiki/seminar/PythonMockObjectTutorial).

System testing comes afterward, since it requires the system to exist, with some subset of system functionality believed to be in working condition. System testing provides a sanity check: given that each module in the program works properly (passes unit tests), does the whole program work? If each unit is okay but the system as a whole is not, there is a problem in the integration between units, meaning the way the units cooperate. For this reason, system testing is also known as integration testing.

System testing is similar to running the system in production use, except that you fix the inputs in advance so that any problems you may find are easy to reproduce. The cost of failures in system testing is lower than in production use, since outputs from system testing are not used to make decisions, serve customers, control external systems, and so on. Rather, outputs from system testing are systematically compared with the outputs that the system should produce given the known inputs. The purpose is to find, in a cheap and reproducible way, any discrepancy between what the program should do and what the program actually does.

Failures discovered by system testing, just like system failures in production use, may reveal some defects in unit tests, as well as defects in the code. Unit testing may have been insufficient: a module's unit tests may have failed to exercise all needed functionality of that module. In this case, the unit tests clearly need to be beefed up. Do that before you change your code to fix the problem, then run the newly strengthened unit tests to confirm that they do now show the existence of the problem. Then fix the problem and run unit tests again to confirm they show no problem anymore. Finally, rerun the system tests to confirm that the problem has indeed gone away.

Most often, failures in system testing reveal communication problems within the development team: a module correctly implements a certain functionality, but another module expects different functionality. This kind of problem (an integration problem in the strict sense) is hard to pinpoint in unit testing. In good development practice, unit tests must run often, so it is crucial that they run fast. It's therefore essential that each unit can assume other units are working correctly and as expected.

Unit tests that are run in reasonably late stages of development can reveal integration problems if the system architecture is hierarchical, a common and reasonable organization. In such an architecture, lower-level modules depend on no others (except perhaps library modules, which you can assume to be correct), and thus the unit tests of such lower-level modules, if complete, suffice to assure correctness. Higher-level modules depend on lower-level ones, and therefore also depend on correct understanding about what functionality each module expects and supplies. Running complete unit tests on higher-level modules, using the true lower-level modules rather than stubs, exercises the interface between modules, as well as the higher-level modules' own code.

Unit tests for higher-level modules are thus run in two ways. You run the tests with stubs for the lower levels during the early stages of development when the lower-level modules are not yet ready or, later, when you need to check only the correctness of the higher levels. During later stages of development, you also regularly run the higher-level modules' unit tests using the true lower-level modules. In this way, you check the correctness of the whole subsystem, from the higher levels downward. Nevertheless, even in this favorable case, you still need to write and run system tests to make sure the whole system's functionality is exercised and checked, and that no interface between modules is neglected.

System testing is similar to running the program in normal ways. You need special support only to ensure that known inputs are supplied and that outputs are captured for comparison with expected outputs. This is easy for programs that perform I/O on files, but terribly hard for programs whose I/O relies on a GUI, network, or other communication with independent external entities. To simulate such external entities and make them predictable and entirely observable, you generally need platform-dependent infrastructure. Another useful piece of supporting infrastructure for system testing is a testing framework that automates the running of system tests, including logging of successes and failures. Such a framework can also help testers prepare sets of known inputs and corresponding expected outputs.

Both free and commercial programs for these purposes exist, but they are not dependent on which programming languages are used in the system under test. System testing is a close kin to what was classically known as black-box testing, or testing that is independent from the implementation of the system under test (and therefore, in particular, independent from the programming languages used for implementation). Instead, testing frameworks usually depend on the operating system platform on which they run, since the tasks they perform are platform-dependent: running programs with given inputs, capturing their outputs, and particularly simulating and capturing GUI, network, and other interprocess communication I/O. Since frameworks for system testing depend on the platform and not on programming languages, I do not cover them further in this book.

18.1.2. The doctest Module

The doctest module has the primary purpose of letting you create good usage examples in your code's docstrings by checking that the examples do in fact produce the results that your docstrings show for them. doctest recognizes such examples by looking within the docstring for the interactive Python prompt '>>> ', followed on the same line by a Python statement, and the statement's expected output on the next line.

As you're developing a module, keep the docstrings up to date and gradually enrich them with examples. Each time part of the module (e.g., a function) is ready, or even partially ready, make it a habit to add examples to the docstrings. Import the module into an interactive session, and interactively use the parts you just developed in order to provide examples with a mix of typical cases, limit cases, and failing cases. For this specific purpose only, use from module import * so that your examples don't prefix module. to each name the module supplies. Copy and paste the text of the interactive session into the docstring in an editor, adjust any mistakes, and you're almost done.

Your documentation is now enriched with examples, and readers will have an easier time following it, assuming you choose a good mix of examples and season it wisely with nonexample text. Make sure you have docstrings, with examples, for your module as a whole, and for each function, class, and method that the module exports. You may choose to skip functions, classes, and methods whose names start with _, since, as their names indicate, they're meant to be private implementation details; doctest by default ignores them, and so should most readers of your module.

Examples that don't match the way your code works are worse than useless. Documentation and comments are useful only if they match reality. Docstrings and comments often get out of date as code changes, and then they become misinformation, hampering rather than helping any reader of the source. Better to have no comments and docstrings at all than to have ones that lie. doctest can help you through the examples in your docstrings. A failing doctest run should prompt you to review the docstring that contains the failing examples, reminding you to keep the docstring's text updated, too.

At the end of your module's source, insert the following small snippet:

if _ _name_ _ == '_ _main_ _':
    import doctest
    doctest.testmod( )

This code calls function testmod of module doctest when you run your module as the main program. testmod examines all relevant docstrings (the module docstring, and docstrings of all public functions, public classes, and public methods of public classes). In each docstring, testmod finds all examples (by looking for occurrences of the interpreter prompt '>>> ', possibly preceded by whitespace) and runs each example. testmod checks that each example's results are equal to the output given in the docstring right after the example. In the case of exceptions, testmod ignores the traceback, and just checks that the expected and observed error messages are equal.

When everything goes right, testmod terminates silently. Otherwise, it outputs detailed messages about examples that failed, showing expected and actual output. Example 18-1 shows a typical example of doctest at work on a module mod.py.

Example 18-1. Using doctest

"""
This module supplies a single function reverseWords that reverses a string by words.

>>> reverseWords('four score and seven years')
'years seven and score four'
>>> reverseWords('justoneword')
'justoneword'
>>> reverseWords('')
''

You must call reverseWords with one argument, and it must be a string:

>>> reverseWords( )
Traceback (most recent call last):
   ...
TypeError: reverseWords( ) takes exactly 1 argument (0 given)
>>> reverseWords('one', 'another')
Traceback (most recent call last):
   ...
TypeError: reverseWords( ) takes exactly 1 argument (2 given)
>>> reverseWords(1)
Traceback (most recent call last):
   ...
AttributeError: 'int' object has no attribute 'split'
>>> reverseWords(u'however, unicode is all right too')
u'too right all is unicode however,'

As a side effect, reverseWords eliminates any redundant spacing:

>>> reverseWords('with   redundant   spacing')
'spacing redundant with'

"""
def reverseWords(astring):
    words = astring.split( )
    words.reverse( )
    return ' '.join(words)

if _ _name_ _=='_ _main_ _':
    import doctest, sys
    doctest.testmod(sys.modules[_ _name_ _])

In this module's docstring, I have snipped the tracebacks from the docstring and replaced them with an ellipsis: this is good common practice, since doctest ignores tracebacks and they add nothing to the explanatory value of each failing case. Apart from this snipping, the docstring is the copy and paste of an interactive session, with the addition of some explanatory text and empty lines for readability. Save this source as mod.py, and then run it with python mod.py. It produces no output, meaning that all examples work right. Try python mod.py -v to get an account of all tests tried and a verbose summary at the end. Finally, alter the example results in the module docstring, making them incorrect, to see the messages doctest provides for errant examples.

While doctest is not primarily meant for general-purpose unit testing, it can nevertheless be a very convenient tool for the purpose. The recommended way to do unit testing in Python is with module unittest, covered in "The unittest Module" on page 457. However, unit testing with doctest can be easier and faster to set up, since it requires little more than copy and paste from an interactive session. If you need to maintain a module that lacks unit tests, retrofitting such tests into the module with doctest is a reasonable compromise. It's certainly better to have doctest-based unit tests than not to have any unit tests at all, as might otherwise happen should you decide that setting up tests "properly" with unittest would take you too long.

If you do decide to use doctest for unit testing, don't cram extra tests into your module's docstrings. This would damage the docstrings by making them too long and hard to read. Keep in the docstrings the right amount and kind of examples, strictly for explanatory purposes, just as if unit testing was not in the picture. Instead, put the extra tests into a global variable of your module, a dictionary named _ _test_ _. The keys in _ _test_ _ are strings used as arbitrary test names, and the corresponding values are strings that doctest picks up and uses in just the same way as it uses docstrings. The values in _ _test_ _ may also be function and class objects, in which case doctest examines their docstrings for tests to run. This latter feature is a convenient way to run doctest on objects with private names, which doctest skips by default.

In Python 2.4, the doctest module also supplies two functions that return instances of the unittest.TestSuite class based on doctests so that you can integrate such tests into testing frameworks based on unittest. The documentation for this advanced functionality is at http://docs.python.org/lib/doctest-unittest-api.html.

18.1.3. The unittest Module

The unittest module is the Python version of a unit-testing framework originally developed by Kent Beck for Smalltalk. Similar, widespread versions of the framework also exist for many other programming languages (e.g., the JUnit package for Java) and are often collectively referred to as xUnit.

To use unittest, you don't put your testing code in the same source file as the tested module, but instead write a separate test module for each module you're testing. A popular convention is to name the test module the same as the module being tested, with a prefix such as 'test_', and put it in a subdirectory named test of the directory where you keep your sources. For example, the test module for mod.py can be test/test_mod.py. You need a simple and consistent naming convention to make it easy for you to write and maintain auxiliary scripts that find and run all unit tests for a package.

Separation between a module's source code and its unit-testing code lets you refactor the module more easily, including possibly recoding its functionality in C, without perturbing the unit-testing code. Knowing that test_mod.py stays intact, whatever changes you make to mod.py, enhances your confidence that passing the tests in test_mod.py indicates that mod.py still works correctly after the changes.

A unit-testing module defines one or more subclasses of unittest's TestCase class. Each subclass specifies one or more test cases by defining test-case methods, which are methods that are callable without arguments and whose names start with test. The subclass may also override method setUp, which the framework calls to prepare a new instance just before calling each test-case method, and tearDown, which the framework calls to clean things up just after calling each test-case method. Each test-case method calls methods of class TestCase whose names start with assert in order to express the conditions that the test must meet. unittest runs the test-case methods within a TestCase subclass in arbitrary order, each on a new instance of the subclass, running setUp just before each test case and tearDown just after each test case.

unittest provides other facilities, such as grouping test cases into test suites, and other more advanced functionality. You do not need such extras unless you're defining a custom unit-testing framework or, at the very least, structuring complicated testing procedures for equally complicated packages. In almost all cases, the concepts and details covered in this section are sufficient to perform effective and systematic unit testing. Example 18-2 shows how to use unittest to provide unit tests for the module mod.py of Example 18-1. For illustration purposes, this example uses unittest to perform exactly the same tests that Example 18-1 uses as examples in docstrings using doctest.

Example 18-2. Using unittest

""" This module tests function reverseWords provided by module mod.py. """
import unittest import mod

class ModTest(unittest.TestCase):

    def testNormalCase(self):
        self.assertEqual(mod.reverseWords('four score and seven years'),
            'years seven and score four')

    def testSingleWord(self):
        self.assertEqual(mod.reverseWords('justoneword'), 'justoneword')

    def testEmpty(self):
        self.assertEqual(mod.reverseWords(''), '')

    def testRedundantSpacing(self):
        self.assertEqual(mod.reverseWords('with   redundant   spacing'),
            'spacing redundant with')

    def testUnicode(self):
        self.assertEqual(mod.reverseWords(u'unicode is all right too'),
            u'too right all is unicode')

    def testExactlyOneArgument(self):
        self.assertRaises(TypeError, mod.reverseWords)
        self.assertRaises(TypeError, mod.reverseWords, 'one', 'another')

    def testMustBeString(self):
        self.assertRaises((AttributeError,TypeError), mod.reverseWords, 1)

if _ _name_ _=  ='_ _main_ _':
    unittest.main( )

Running this script with python test_mod.py is a bit more verbose than using python mod.py to run doctest, as in Example 18-1. test_mod.py outputs a . (dot) for each test-case method it runs, then a separator line of dashes, and finally a summary line, such as "Ran 7 tests in 0.110s," and a final line of "OK" if every test passed.

Each test-case method makes one or more calls to methods whose names start with assert (or their synonyms whose names start with fail). Here, method testExactly1Argument is the only one with two such calls. In more complicated cases, multiple calls to assert methods from a single test-case method are quite common.

Even in a case as simple as this, one minor aspect shows that, for unit testing, unittest is more powerful and flexible than doctest. In method testMustBeString, we pass as the first argument to assertRaises a pair of exception classes, meaning we accept either kind of exception. test_mod.py therefore accepts as valid multiple implementations of mod.py. It accepts the implementation in Example 18-1, which tries calling method split on its argument, and therefore raises AttributeError when called with an argument that is not a string. However, it also accepts a different hypothetical implementation, one that raises TypeError instead when called with an argument of the wrong type. It would be possible to code this testing functionality with doctest, but it would be awkward and nonobvious, while unittest makes it simple and natural.

This kind of flexibility is crucial for real-life unit tests, which to some extent are executable specifications for their modules. You could, pessimistically, view the need for flexibility as indicating that the interface of the code you're testing is not well defined. However, it's best to view the interface as being defined with a useful amount of flexibility for the implementer: under circumstance X (argument of invalid type passed to function reverseWords, in this example), either of two things (raising AttributeError or TypeError) is allowed to happen.

Thus, implementations with either of the two behaviors are correct, and the implementer can choose between them on the basis of such considerations as performance and clarity. By viewing unit tests as executable specifications for their modules (the modern view, and the basis of test-first development) rather than as white-box tests strictly constrained to a specific implementation (as in some traditional taxonomies of testing), the tests become a more vital component of the software development process.

18.1.3.1. The TestCase class

With unittest, you write test cases by subclassing class TestCase and adding methods, callable without arguments, whose names start with test. Such test-case methods, in turn, call methods that your subclass inherits from TestCase, whose names start with assert (or their synonyms, whose names start with fail), to indicate conditions that must hold for the test to succeed.

Class TestCase also defines two methods that your subclass can optionally override in order to group actions to perform right before and right after each test-case method runs. This doesn't exhaust TestCase's functionality, but you won't need the rest unless you're developing testing frameworks or performing some similarly advanced task. The frequently called methods in a TestCase instance t are the following.

assert_, failUnless

t.assert_(condition,msg=None)

Fails and outputs msg if condition is false; otherwise, does nothing. The underscore in the name is needed because assert is a Python keyword. failUnless is a synonym. Do not use these methods when you can use more specific ones, such as assertEqual.

assertAlmost-Equal, failUnlessAl-mostEqual

t.assertAlmostEqual(first,second,places=7,msg=None)

Fails and outputs msg if first!=second to within places decimal digits; otherwise, does nothing. failUnlessAlmostEqual is a synonym. Almost always, these methods are preferable to assertEqual when what you are comparing are floats.

assertEqual, failUnlessEqual

t.assertEqual(first,second,msg=None)

Fails and outputs msg if first!=second; otherwise, does nothing. failUnlessEqual is a synonym.

assertNotAl-mostEqual, failIfAlmost-Equal

t.assertNotAlmostEqual(first,second,places=7,msg=None)

Fails and outputs msg if first==second to within places decimal digits; otherwise, does nothing. failIfAlmostEqual is a synonym.

assertNotEqual, failIfEqual

t.assertNotEqual(first,second,msg=None)

Fails and outputs msg if first==second; otherwise, does nothing. failIfEqual is a synonym.

assertRaises, failUnlessRaises

t.assertRaises(exceptionSpec,callable,*args,**kwargs)

Calls callable(*args,**kwargs). Fails if the call doesn't raise any exception. If the call raises an exception that does not meet exceptionSpec, assertRaises propagates the exception. If the call raises an exception that meets exceptionSpec, assertRaises does nothing. exceptionSpec can be an exception class or a tuple of classes, just like the first argument of the except clause in a try/except statement. failUnlessRaises is a synonym.

fail

t.fail(msg=None)

Fails unconditionally and outputs msg.

failIf

t.failIf(condition, msg=None)

Fails and outputs msg if condition is true; otherwise, does nothing.

setUp

t.setUp( )

The framework calls t.setUp( ) just before calling a test-case method. The implementation in TestCase does nothing. This method is provided in order to let your subclass override it if it needs to perform some preparation for each test.

tearDown

t.tearDown( )

The framework calls t.tearDown( ) just after calling a test-case method. The implementation in TestCase does nothing. This method is provided in order to let your subclass override it if it needs to perform some cleanup after each test.


18.1.3.2. Unit tests dealing with large amounts of data

Unit tests must be fast, since they are run frequently during development. Therefore, it's best to unit-test each aspect of your modules' functionality on small amounts of data when possible. This makes each unit test faster and lets you conveniently embed all needed data in the test's source code. When you test a function that reads from or writes to a file object, in particular, you normally use an instance of class cStringIO (covered in "The StringIO and cStringIO Modules" on page 229) to simulate a file object while holding the data in memory; this approach is faster than writing to disk and saves you the bother of having to remove disk files after your tests have run.

However, in some rare cases, it may be impossible to fully exercise a module's functionality without supplying and/or comparing data in quantities larger than can be reasonably embedded in a test's source code. In such cases, your unit test will have to rely on auxiliary, external datafiles to hold the data it needs to supply to the module it tests and/or the data it needs to compare to the tested module's output. Even then, you're generally better off reading the data into instances of cStringIO rather than directing the tested module to perform actual disk I/O. Even more important, I strongly suggest that you generally use stubs to test modules meant to interact with other external entities, such as a database, a GUI, or some other program over a network. It's easier for you to control all aspects of the test when using stubs rather than real external entities. Also, to reiterate, the speed at which you can run unit tests is important, and it's invariably faster to perform simulated operations in stubs, rather than real operations.


Previous Page
Next Page