I l@ve RuBoard

4.4 A Regression Test Script

As we've seen, Python provides interfaces to a variety of system services, along with tools for adding others. Example 4-5 shows some commonly used services in action. It implements a simple regression-test system, by running a command-line program with a set of given input files and comparing the output of each run to the prior run's results. This script was adapted from an automated testing system I wrote to catch errors introduced by changes in program source files; in a big system, you might not know when a fix is really a bug in disguise.

Example 4-5. PP2E\System\Filetools\regtest.py

#!/usr/local/bin/python
import os, sys                            # get unix, python services 
from stat import ST_SIZE                  # or use os.path.getsize
from glob import glob                     # file name expansion
from os.path import exists                # file exists test
from time import time, ctime              # time functions

print 'RegTest start.' 
print 'user:', os.environ['USER']         # environment variables
print 'path:', os.getcwd(  )              # current directory
print 'time:', ctime(time(  )), '\n'
program = sys.argv[1]                     # two command-line args
testdir = sys.argv[2]

for test in glob(testdir + '/*.in'):      # for all matching input files
    if not exists('%s.out' % test):
        # no prior results
        os.system('%s < %s > %s.out 2>&1' % (program, test, test))
        print 'GENERATED:', test
    else: 
        # backup, run, compare
        os.rename(test + '.out', test + '.out.bkp')
        os.system('%s < %s > %s.out 2>&1' % (program, test, test))
        os.system('diff %s.out %s.out.bkp > %s.diffs' % ((test,)*3) )
        if os.stat(test + '.diffs')[ST_SIZE] == 0:
            print 'PASSED:', test 
            os.remove(test + '.diffs')
        else:
            print 'FAILED:', test, '(see %s.diffs)' % test

print 'RegTest done:', ctime(time(  ))

Some of this script is Unix-biased. For instance, the 2>&1 syntax to redirect stderr works on Unix and Windows NT/2000, but not on Windows 9x, and the diff command line spawned is a Unix utility. You'll need to tweak such code a bit to run this script on some platforms. Also, given the improvements to the os module's popen calls in Python 2.0, they have now become a more portable way to redirect streams in such a script, and an alternative to shell command redirection syntax.

But this script's basic operation is straightforward: for each filename with an .in suffix in the test directory, this script runs the program named on the command line and looks for deviations in its results. This is an easy way to spot changes (called "regressions") in the behavior of programs spawned from the shell. The real secret of this script's success is in the filenames used to record test information: within a given test directory testdir :

testdir/test.in files represent standard input sources for program runs.
testdir/test.in.out files represent the output generated for each input file.
testdir/test.in.out.bkp files are backups of prior .in.out result files.
testdir/test.in.diffs files represent regressions; output file differences.

Output and difference files are generated in the test directory, with distinct suffixes. For example, if we have an executable program or script called shrubbery, and a test directory called test1 containing a set of .in input files, a typical run of the tester might look something like this:

% regtest.py shrubbery test1
RegTest start.
user: mark
path: /home/mark/stuff/python/testing
time: Mon Feb 26 21:13:20 1996

FAILED: test1/t1.in (see test1/t1.in.diffs)
PASSED: test1/t2.in
FAILED: test1/t3.in (see test1/t3.in.diffs)
RegTest done: Mon Feb 26 21:13:27 1996

Here, shrubbery is run three times, for the three .in canned input files, and the results of each run are compared to output generated for these three inputs the last time testing was conducted. Such a Python script might be launched once a day, to automatically spot deviations caused by recent source code changes (e.g., from a cron job on Unix).

We've already met system interfaces used by this script; most are fairly standard Unix calls, and not very Python-specific to speak of. In fact, much of what happens when we run this script occurs in programs spawned by os.system calls. This script is really just a driver ; because it is completely independent of both the program to be tested and the inputs it will read, we can add new test cases on the fly by dropping a new input file in a test directory.

So given that this script just drives other programs with standard Unix-like calls, why use Python here instead of something like C ? First of all, the equivalent program in C would be much longer: it would need to declare variables, handle data structures, and more. In C, all external services exist in a single global scope (the linker's scope); in Python, they are partitioned into module namespaces (os, sys, etc.) to avoid name clashes. And unlike C, the Python code can be run immediately, without compiling and linking; changes can be tested much quicker in Python. Moreover, with just a little extra work we could make this script run on Windows 9x too. As you can probably tell by now, Python excels when it comes to portability and productivity.

Because of such benefits, automated testing is a very common role for Python scripts. If you are interested in using Python for testing, be sure to see Python's web site (http://www.python.org) for other available tools (e.g., the PyUnit system).

Testing Gone Bad?

Once we learn about sending email from Python scripts in Chapter 11, you might also want to augment this script to automatically send out email when regularly run tests fail. That way, you don't even need to remember to check results. Of course, you could go further still.

One company I worked at added sound effects to compiler test scripts; you got an audible round of applause if no regressions were found, and an entirely different noise otherwise. (See the end of this chapter and playfile.py in Chapter 11 for audio hints.)

Another company in my development past ran a nightly test script that automatically isolated the source code file check-in that triggered a test regression, and sent a nasty email to the guilty party (and their supervisor). Nobody expects the Spanish Inquisition!

I l@ve RuBoard