|I l@ve RuBoard|
17.2 Implementing Stacks
Stacks are a common and straightforward data structure, used in a variety of applications: language processing, graph searches, etc. In short, stacks are a last-in-first-out collection of objects: the last item added to the collection is always the next one to be removed. Clients use stacks by:
Depending on client requirements, there may also be tools for such tasks as testing if the stack is empty, fetching the top item without popping it, iterating over a stack's items, testing for item membership, etc.
In Python, a simple list is often adequate for implementing a stack: because we can change lists in place, we can either add and delete items from the front (left) or end (right). Table 17-1 summarizes various built-in operations available for implementing stack-like behavior with Python lists, depending on whether the stack "top" is the first or last node in the list. In this table, string 'c' is the top item on the stack.
This list arrangement works and will be relatively fast. But it also binds stack-based programs to the stack representation chosen: stack operations will all be hardcoded. If we later want to change how a stack is represented, or extend its basic operations, we're stuck: every stack-based program will have to be updated.
For instance, to add logic that monitors the number of stack operations a program performs, we'd have to add code around each hardcoded stack operation. In a large system, this operation may be nontrivial. As we'll see in the next chapter, we may also decide to move stacks to a C-based implementation, if they prove to be a performance bottleneck. As a general rule, hardcoded operations on built-in data structures don't support future migrations as well as we'd sometimes like.
17.2.1 A Stack Module
Perhaps a better approach is to encapsulate stack implementations using Python's code reuse tools. Let's begin by implementing a stack as a module containing a Python list, plus functions to operate on it (see Example 17-1).
stack =  # on first import error = 'stack1.error' # local exceptions def push(obj): global stack # use 'global' to change stack = [obj] + stack # add item to the front def pop( ): global stack if not stack: raise error, 'stack underflow' # raise local error top, stack = stack, stack[1:] # remove item at front return top def top( ): if not stack: # raise local error raise error, 'stack underflow' # or let IndexError occur return stack def empty( ): return not stack # is the stack ? def member(obj): return obj in stack # item in stack? def item(offset): return stack[offset] # index the stack def length( ): return len(stack) # number entries def dump( ): print '<Stack:%s>' % stack
This module creates a list object (stack) and exports functions to manage access to it. The stack is declared global in functions that change it, but not in those that just reference it. The module also defines an error object (error) that can be used to catch exceptions raised locally in this module. Some stack errors are built-in exceptions: method item triggers IndexError for out-of-bounds indexes.
Most of the stack's functions just delegate the operation to the embedded list used to represent the stack. In fact, the module is really just a wrapper around a Python list. But this extra layer of interface logic makes clients independent of the actual implementation of the stack. So, we're able to change the stack later without impacting its clients.
As usual, one of the best ways to understand such code is to see it in action. Here's an interactive session that illustrates the module's interfaces:
C:\...\PP2E\Dstruct\Basic>python >>> import stack1 >>> stack1.push('spam') >>> stack1.push(123) >>> stack1.top( ) 123 >>> stack1.stack [123, 'spam'] >>> stack1.pop( ) 123 >>> stack1.dump( ) <Stack:['spam']> >>> stack1.pop( ) 'spam' >>> stack1.empty( ) 1 >>> for c in 'spam': stack1.push(c) ... >>> while not stack1.empty( ): ... print stack1.pop( ), ... m a p s
Other operations are analogous, but the main thing to notice here is that all stack operations are module functions. For instance, it's possible to iterate over the stack, but we need to use counter-loops and indexing function calls (item ). There's nothing preventing clients from accessing (and changing) stack1.stack directly, but doing so defeats the purpose of interfaces like this one.
17.2.2 A Stack Class
Perhaps the biggest drawback of the module-based stack is that it supports only a single stack object. All clients of the stack module effectively share the same stack. Sometimes we want this feature: a stack can serve as a shared-memory object for multiple modules. But to implement a true stack datatype, we need to use classes.
To illustrate, let's define a full-featured stack class. The Stack class shown in Example 17-2 defines a new datatype, with a variety of behavior. Like the module, the class uses a Python list to hold stacked objects. But this time, each instance gets its own list. The class defines both "real" methods, and specially named methods that implement common type operations. Comments in the code describe special methods.
error = 'stack2.error' # when imported: local exception class Stack: def __init__(self, start=): # self is the instance object self.stack =  # start is any sequence: stack.. for x in start: self.push(x) self.reverse( ) # undo push's order reversal def push(self, obj): # methods: like module + self self.stack = [obj] + self.stack # top is front of list def pop(self): if not self.stack: raise error, 'underflow' top, self.stack = self.stack, self.stack[1:] return top def top(self): if not self.stack: raise error, 'underflow' return self.stack def empty(self): return not self.stack # instance.empty( ) # overloads def __repr__(self): return '[Stack:%s]' % self.stack # print, backquotes,.. def __cmp__(self, other): return cmp(self.stack, other.stack) # '==', '>, '<=', '!=',.. def __len__(self): return len(self.stack) # len(instance), not instance def __add__(self, other): return Stack(self.stack + other.stack) # instance1 + instance2 def __mul__(self, reps): return Stack(self.stack * reps) # instance * reps def __getitem__(self, offset): return self.stack[offset] # intance[offset], in, for def __getslice__(self, low, high): return Stack(self.stack[low : high]) # instance[low:high] def __getattr__(self, name): return getattr(self.stack, name) # instance.sort()/reverse( )/..
Now distinct instances are created by calling the Stack class like a function. In most respects, the Stack class implements operations exactly like the stack module in Example 17-1. But here, access to the stack is qualified by self, the subject instance object. Each instance has its own stack attribute, which refers to the instance's own list. Furthermore, instance stacks are created and initialized in the __init__ constructor method, not when the module is imported. Let's make a couple of stacks to see how this all works in practice:
>>> from stack2 import Stack >>> x = Stack( ) # make a stack object, push items >>> x.push('spam') >>> x.push(123) >>> x # __repr__ prints a stack [Stack:[123, 'spam']] >>> y = Stack( ) # two distinct stacks objects >>> y.push(3.1415) # they do not share content >>> y.push(x.pop( )) >>> x, y ([Stack:['spam']], [Stack:[123, 3.1415]]) >>> z = Stack( ) # third distinct stack object >>> for c in 'spam': z.push(c) ... >>> while z: print z.pop( ), # __len__ tests stack truth ... m a p s >>> z = x + y # __add__ handles stack + >>> z # holds three different types [Stack:['spam', 123, 3.1415]] >>> for item in z: print item, # __getitem__ does for ... spam 123 3.1415
Like lists and dictionaries, Stack defines both methods and operators for manipulating instances by attribute qualification and expressions. Additionally, it defines the __getattr__ metaclass method to intercept references to attributes not defined in the class and to route them to the wrapped list object (to support list methods: sort, append, reverse, etc.). Many of the module's operations become operators in the class. Table 17-2 shows the equivalence of module and class operations (columns 1 and 2) and gives the class method that comes into play for each (column 3).
In effect, classes let us extend Python's set of built-in types with reusable types implemented in Python modules. Class-based types may be used just like built-in types: depending on which operation methods they define, classes can implement numbers, mappings, and sequences, and may be mutable or not. Class-based types may also fall somewhere in between these categories.
17.2.3 Customization: Performance Monitors
So far we've seen how classes support multiple instances and integrate better with Python's object model by defining operator methods. One of the other main reasons for using classes is to allow for future extensions and customizations. By implementing stacks with a class, we can later add subclasses that specialize the implementation for new demands.
For instance, suppose we've started using the Stack class in Example 17-2, but we start running into performance problems. One way to isolate bottlenecks is to instrument data structures with logic that keeps track of usage statistics, which we can analyze after running client applications. Because Stack is a class, we can add such logic in a new subclass, without affecting the original stack module (or its clients). The subclass in Example 17-3 extends Stack to keep track of overall push/pop usage frequencies and to record the maximum size of each instance.
from stack2 import Stack # extends imported Stack class StackLog(Stack): # count pushes/pops, max-size pushes = pops = 0 # shared/static class members def __init__(self, start=): # could also be module vars self.maxlen = 0 Stack.__init__(self, start) def push(self, object): Stack.push(self, object) # do real push StackLog.pushes = StackLog.pushes + 1 # overall stats self.maxlen = max(self.maxlen, len(self)) # per-instance stats def pop(self): StackLog.pops = StackLog.pops + 1 # overall counts return Stack.pop(self) # not 'self.pops': instance def stats(self): return self.maxlen, self.pushes, self.pops # get counts from instance
This subclass works the same as the original Stack; it just adds monitoring logic. The new stats method is used to get a statistics tuple through an instance:
>>> from stacklog import StackLog >>> x = StackLog( ) >>> y = StackLog( ) # make two stack objects >>> for i in range(3): x.push(i) # and push object on them ... >>> for c in 'spam': y.push(c) ... >>> x, y # run inherited __repr__ ([Stack:[2, 1, 0]], [Stack:['m', 'a', 'p', 's']]) >>> x.stats(), y.stats( ) ((3, 7, 0), (4, 7, 0)) >>> >>> y.pop(), x.pop( ) ('m', 2) >>> x.stats(), y.stats( ) # my maxlen, all pushes, all pops ((3, 7, 2), (4, 7, 2))
Notice the use of class attributes to record overall pushes and pops, and instance attributes for per-instance maximum length. By hanging attributes on different objects, we can expand or narrow their scopes.
17.2.4 Optimization: Tuple Tree Stacks
One of the nice things about wrapping objects up in classes is that you are free to change the underlying implementation without breaking the rest of your program. Optimizations can be added in the future, for instance, with minimal impact; the interface is unchanged, even if the internals are. There are a variety of ways to implement stacks, some more efficient than others. So far, our stacks have used slicing and concatenation to implement pushing and popping. This method is relatively inefficient: both operations make copies of the wrapped list object. For large stacks, this practice can add a significant time penalty.
One way to speed up such code is to change the underlying data structure completely. For example, we can store the stacked objects in a binary tree of tuples: each item may be recorded as a pair: (object, tree), where object is the stacked item, and tree is either another tuple pair giving the rest of the stack or None to designate an empty stack. A stack of items [1,2,3,4] would be internally stored as a tuple tree (1,(2,(3,(4,None)))).
This tuple-based representation is similar to the notion of "cons-cells" in Lisp-family languages: the object on the left is the car, and the rest of the tree on the right is the cdr. Because we add or remove only a top tuple to push and pop items, this structure avoids copying the entire stack. For large stacks, the benefit might be significant. The next class, shown in Example 17-4, implements these ideas.
class Stack: def __init__(self, start=): # init from any sequence self.stack = None # even other (fast)stacks for i in range(-len(start), 0): self.push(start[-i - 1]) # push in reverse order def push(self, node): # grow tree 'up/left' self.stack = node, self.stack # new root tuple: (node, tree) def pop(self): node, self.stack = self.stack # remove root tuple return node # TypeError if empty def empty(self): return not self.stack # is it 'None'? def __len__(self): # on: len, not len, tree = 0, self.stack while tree: len, tree = len+1, tree # visit right subtrees return len def __getitem__(self, index): # on: x[i], in, for len, tree = 0, self.stack while len < index and tree: # visit/count nodes len, tree = len+1, tree if tree: return tree # IndexError if out-of-bounds else: raise IndexError # so 'in' and 'for' stop def __repr__(self): return '[FastStack:' + `self.stack` + ']'
This class's __getitem__ method handles indexing, in tests, and for loop iteration as before, but this version has to traverse a tree to find a node by index. Notice that this isn't a subclass of the original Stack class. Since nearly every operation is implemented differently here, inheritance won't really help. But clients that restrict themselves to the operations that are common to both classes can still use them interchangeably -- they just need to import a stack class from a different module to switch implementations. Here's a session with this stack version; as long as we stick to pushing, popping, indexing, and iterating, this version is essentially indistinguishable from the original:
>>> from stack3 import Stack >>> x = Stack( ) >>> y = Stack( ) >>> for c in 'spam': x.push(c) ... >>> for i in range(3): y.push(i) ... >>> x [FastStack:('m', ('a', ('p', ('s', None))))] >>> y [FastStack:(2, (1, (0, None)))] >>> len(x), x, x[-1] (4, 'p', 'm') >>> x.pop( ) 'm' >>> x [FastStack:('a', ('p', ('s', None)))] >>> >>> while y: print y.pop( ), ... 2 1 0
17.2.5 Optimization: In-place List Modifications
Perhaps a better way to speed up the stack object, though, is to fall back on the mutability of Python's list object. Because lists can be changed in place, they can be modified more quickly than any of the prior examples. In-place change operations like append are prone to complications when a list is referenced from more than one place. But because the list inside the stack object isn't meant to be used directly, we're probably safe here. The module in Example 17-5 shows one way to implement a stack with in-place changes; some operator overloading methods have been dropped to keep this simple. The new Python pop method it uses is equivalent to indexing and deleting the item at offset -1 (top is end-of-list here).
error = 'stack4.error' # when imported: local exception class Stack: def __init__(self, start=): # self is the instance object self.stack =  # start is any sequence: stack.. for x in start: self.push(x) def push(self, obj): # methods: like module + self self.stack.append(obj) # top is end of list def pop(self): if not self.stack: raise error, 'underflow' return self.stack.pop( ) # like fetch and delete stack[-1] def top(self): if not self.stack: raise error, 'underflow' return self.stack[-1] def empty(self): return not self.stack # instance.empty( ) def __len__(self): return len(self.stack) # len(instance), not intance def __getitem__(self, offset): return self.stack[offset] # instance[offset], in, for def __repr__(self): return '[Stack:%s]' % self.stack
This version works like the original in module stack2, too -- just replace stack2 with stack4 in the previous interaction to get a feel for its operation. The only obvious difference is that stack items are in reverse when printed (i.e., the top is the end):
>>> from stack4 import Stack >>> x = Stack( ) >>> x.push('spam') >>> x.push(123) >>> x [Stack:['spam', 123]] >>> >>> y = Stack( ) >>> y.push(3.1415) >>> y.push(x.pop( )) >>> x, y ([Stack:['spam']], [Stack:[3.1415, 123]]) >>> y.top( ) 123
17.2.6 Timing the Improvements
The in-place changes stack object probably runs faster than both the original and the tuple-tree version, but the only way to really be sure how much faster is to time the alternative implementations. Since this could be something we'll want to do more than once, let's first define a general module for timing functions in Python. In Example 17-6, the built-in time module provides a clock function that we can use to get the current CPU time in floating-point seconds, and the function timer.test simply calls a function reps times and returns the number of elapsed seconds by subtracting stop from start CPU times.
def test(reps, func, *args): import time start = time.clock( ) # current CPU tim in float seconds for i in xrange(reps): # call function reps times apply(func, args) # discard any return value return time.clock( ) - start # stop time - start time
Next, we define a test driver script (see Example 17-7). It expects three command-line arguments: the number of pushes, pops, and indexing operations to perform (we'll vary these arguments to test different scenarios). When run at the top level, the script creates 200 instances of the original and optimized stack classes, and performs the specified number of operations on each stack. Pushes and pops change the stack; indexing just accesses it.
import stack2 # list-based stacks: [x]+y import stack3 # tuple-tree stacks: (x,y) import stack4 # in-place stacks: y.append(x) import timer # general function timer function rept = 200 from sys import argv pushes, pops, items = eval(argv), eval(argv), eval(argv) def stackops(stackClass): #print stackClass.__module__ x = stackClass('spam') # make a stack object for i in range(pushes): x.push(i) # exercise its methods for i in range(items): t = x[i] for i in range(pops): x.pop( ) print 'stack2:', timer.test(rept, stackops, stack2.Stack) # pass class to test print 'stack3:', timer.test(rept, stackops, stack3.Stack) # rept*(push+pop+ix) print 'stack4:', timer.test(rept, stackops, stack4.Stack)
Here are some of the timings reported by the test driver script. The three outputs represent the measured run times in seconds for the original, tuple, and in-place stacks. For each stack type, the first test creates 200 stack objects and performs roughly 120,000 stack operations (200 rept x (200 pushes + 200 indexes + 200 pops)), in the test duration times listed. These results were obtained on a 650 MHz Pentium III Windows machine, and a Python 1.5.2 install:
C:\...\PP2E\Dstruct\Basic>python stacktime.py 200 200 200 stack2: 1.67890008213 stack3: 7.70020952413 stack4: 0.694291724635 C:\...\PP2E\Dstruct\Basic>python stacktime.py 200 50 200 stack2: 1.06876246669 stack3: 7.48964866994 stack4: 0.477584270605 C:\...\PP2E\Dstruct\Basic>python stacktime.py 200 200 50 stack2: 1.34536448817 stack3: 0.795615917129 stack4: 0.57297976835 C:\...\PP2E\Dstruct\Basic>python stacktime.py 200 200 0 stack2: 1.33500477715 stack3: 0.300776077373 stack4: 0.533050336077
If you look closely enough, you'll notice that the results show that the tuple-based stack (stack3) performs better when we do more pushing and popping, but worse if we do much indexing. Indexing lists is extremely fast for built-in lists, but very slow for tuple trees -- the Python class must traverse the tree manually. The in-place change stacks (stack4) are almost always fastest, unless no indexing is done at all -- tuples won by a hair in the last test case. Since pushes and pops are most of what clients would do to a stack, tuples are a contender, despite their poor indexing performance. Of course, we're talking about fractions of a second after many tens of thousands of operations; in many applications, your users probably won't care either way.
|I l@ve RuBoard|