I l@ve RuBoard Previous Section Next Section

19.7 A C Extension Type String Stack

To implement multiple-instance objects in C, you need to code a C extension type, not a module. Like Python classes, C types generate multiple-instance objects and can overload (i.e., intercept and implement) Python expression operators and type operations. Unlike classes, though, types do not support attribute inheritance by themselves -- attributes are fetched from a flat names table, not a namespace objects tree. That makes sense if you realize that Python's built-in types are simply precoded C extension types; when you ask for the list append method, for instance, inheritance never enters the picture. We can add inheritance for types by coding "wrapper" classes, but it is a manual process (more on this later).

One of the biggest drawbacks of types, though, is their size -- to implement a realistically equipped C type, you need to code lots of not-very-pretty C code, and fill out type descriptor tables with pointers to link up operation handlers. In fact, C extension types are so complex that I'm going to cut some details here. To give you a feel for the overall structure, Example 19-15 presents a C string stack type implementation, but with the bodies of all its functions stripped out. For the complete implementation, see this file on the book's CD (see http://examples.oreilly.com/python2).

This C type roughly implements the same interface as the stack classes we met earlier in Chapter 17, but imposes a few limits on the stack itself and does not support specialization by subclassing (it's a type, not a class). The stripped parts use the same algorithms as the C module in Example 19-14, but operate on the passed-in self object, which now refers to the particular type instance object being processed, just as the first argument does in class methods. In types, self is a pointer to an allocated C struct that represents a type instance object.

Example 19-15. PP2E\Integrate\Extend\Stacks\stacktyp.c
/****************************************************
 * stacktyp.c: a character-string stack data-type;
 * a C extension type, for use in Python programs;
 * stacktype module clients can make multiple stacks;
 * similar to stackmod, but 'self' is the instance,
 * and we can overload sequence operators here;
 ****************************************************/

#include "Python.h"

static PyObject *ErrorObject;      /* local exception */
#define onError(message) \
       { PyErr_SetString(ErrorObject, message); return NULL; }

/*****************************************************************************
 * STACK-TYPE INFORMATION
 *****************************************************************************/

#define MAXCHARS 2048
#define MAXSTACK MAXCHARS

typedef struct {                 /* stack instance object format */
    PyObject_HEAD                /* python header: ref-count + &typeobject */
    int top, len;  
    char *stack[MAXSTACK];       /* per-instance state info */
    char strings[MAXCHARS];      /* same as stackmod, but multiple copies */
} stackobject;

/*****************************************************************************
 * INSTANCE METHODS
 *****************************************************************************/

static PyObject *             /* on "instance.push(arg)" */
stack_push(self, args)        /* 'self' is the stack instance object */
    stackobject *self;        /* 'args' are args passed to self.push method */
    PyObject    *args; 
{    ...
}
static PyObject *
stack_pop(self, args)
    stackobject *self;
    PyObject    *args;        /* on "instance.pop(  )" */
{    ...
}
static PyObject *
stack_top(self, args)
    stackobject *self;
    PyObject    *args;
{    ...
}
static PyObject *
stack_empty(self, args)
    stackobject *self;
    PyObject    *args;
{    ...
}
static struct PyMethodDef stack_methods[] = {     /* instance methods */
 {"push",       stack_push,     1},               /* name/address table */
 {"pop",        stack_pop,      1},               /* like list append,sort */
 {"top",        stack_top,      1},         
 {"empty",      stack_empty,    1},               /* extra ops besides optrs */
 {NULL,         NULL}                             /* end, for getattr here */
};

/*****************************************************************************
 * BASIC TYPE-OPERATIONS
 *****************************************************************************/

static stackobject *             /* on "x = stacktype.Stack(  )" */
newstackobject(  )                 /* instance constructor function */    
{   ...                            /* these don't get an 'args' input */
}
static void                      /* instance destructor function */
stack_dealloc(self)              /* when reference-count reaches zero */
    stackobject *self;
{   ...                            /* do cleanup activity */
}
static int
stack_print(self, fp, flags)
    stackobject *self;
    FILE *fp;
    int flags;                   /* print self to file */
{   ...
}
static PyObject *
stack_getattr(self, name)        /* on "instance.attr" reference  */
    stackobject *self;           /* make a bound-method or member */
    char *name; 
{   ...
}
static int
stack_compare(v, w)              /* on all comparisons */
    stackobject *v, *w;
{   ...
}

/*****************************************************************************
 * SEQUENCE TYPE-OPERATIONS
 *****************************************************************************/

static int
stack_length(self)
    stackobject *self;               /* called on "len(instance)" */
{   ...
}
static PyObject *
stack_concat(self, other)
    stackobject *self;               /* on "instance + other" */
    PyObject    *other;              /* 'self' is the instance */
{   ...
}
static PyObject *
stack_repeat(self, n)                /* on "instance * N" */
    stackobject *self;               /* new stack = repeat self n times */
    int n;
{   ...
}
static PyObject *
stack_item(self, index)              /* on "instance[offset]", "in/for" */
    stackobject *self;               /* return the i-th item of self */
    int index;                       /* negative index pre-adjusted */
{   ...
}
static PyObject *
stack_slice(self, ilow, ihigh)
    stackobject *self;               /* on "instance[ilow:ihigh]" */
    int ilow, ihigh;                 /* negative-adjusted, not scaled */
{   ...
}

/*****************************************************************************
 * TYPE DESCRIPTORS
 *****************************************************************************/

static PySequenceMethods stack_as_sequence = {  /* sequence supplement     */
      (inquiry)       stack_length,             /* sq_length    "len(x)"   */
      (binaryfunc)    stack_concat,             /* sq_concat    "x + y"    */
      (intargfunc)    stack_repeat,             /* sq_repeat    "x * n"    */
      (intargfunc)    stack_item,               /* sq_item      "x[i], in" */
      (intintargfunc) stack_slice,              /* sq_slice     "x[i:j]"   */
      (intobjargproc)     0,                    /* sq_ass_item  "x[i] = v" */
      (intintobjargproc)  0,                    /* sq_ass_slice "x[i:j]=v" */
};

static PyTypeObject Stacktype = {      /* main python type-descriptor */
  /* type header */                    /* shared by all instances */
      PyObject_HEAD_INIT(&PyType_Type)         
      0,                               /* ob_size */
      "stack",                         /* tp_name */
      sizeof(stackobject),             /* tp_basicsize */
      0,                               /* tp_itemsize */

  /* standard methods */
      (destructor)  stack_dealloc,     /* tp_dealloc  ref-count==0  */
      (printfunc)   stack_print,       /* tp_print    "print x"     */
      (getattrfunc) stack_getattr,     /* tp_getattr  "x.attr"      */
      (setattrfunc) 0,                 /* tp_setattr  "x.attr=v"    */
      (cmpfunc)     stack_compare,     /* tp_compare  "x > y"       */
      (reprfunc)    0,                 /* tp_repr     `x`, print x  */

  /* type categories */
      0,                               /* tp_as_number   +,-,*,/,%,&,>>,...*/
      &stack_as_sequence,              /* tp_as_sequence +,[i],[i:j],len, ...*/
      0,                               /* tp_as_mapping  [key], len, ...*/

  /* more methods */
      (hashfunc)     0,                /* tp_hash    "dict[x]" */
      (ternaryfunc)  0,                /* tp_call    "x(  )"     */
      (reprfunc)     0,                /* tp_str     "str(x)"  */

};  /* plus others: see Include/object.h */

/*****************************************************************************
 * MODULE LOGIC 
 *****************************************************************************/

static PyObject *
stacktype_new(self, args)                 /* on "x = stacktype.Stack(  )" */
    PyObject *self;                       /* self not used */
    PyObject *args;                       /* constructor args */
{
    if (!PyArg_ParseTuple(args, ""))      /* Module-method function */
        return NULL;
    return (PyObject *)newstackobject(  );  /* make a new type-instance object */
}                                         /* the hook from module to type... */

static struct PyMethodDef stacktype_methods[] = {
    {"Stack",  stacktype_new,  1},             /* one function: make a stack */ 
    {NULL,     NULL}                           /* end marker, for initmodule */
};

void
initstacktype(  )                 /* on first "import stacktype" */
{
    PyObject *m, *d;
    m = Py_InitModule("stacktype", stacktype_methods);   /* make the module, */
    d = PyModule_GetDict(m);                             /* with 'Stack' func */
    ErrorObject = Py_BuildValue("s", "stacktype.error");
    PyDict_SetItemString(d, "error", ErrorObject);       /* export exception */
    if (PyErr_Occurred(  ))
        Py_FatalError("can't initialize module stacktype");

}

19.7.1 Anatomy of a C Extension Type

Although most of file stacktyp.c is missing, there is enough here to illustrate the global structure common to C type implementations:

Instance struct

The file starts off by defining a C struct called stackobject that will be used to hold per-instance state information -- each generated instance object gets a newly malloc'd copy of the struct. It serves the same function as class instance attribute dictionaries, and contains data that was saved in global variables by the C stack module.

Instance methods

As in the module, a set of instance methods follows next; they implement method calls such as push and pop. But here, method functions process the implied instance object, passed in to the self argument. This is similar in spirit to class methods. Type instance methods are looked up in the registration table of the code listing (Example 19-15) when accessed.

Basic type operations

Next, the file defines functions to handle basic operations common to all types: creation, printing, qualification, and so on. These functions have more specific type signatures than instance method handlers. The object creation handler allocates a new stack struct, and initializes its header fields: the reference count is set to 1, and its type object pointer is set to the Stacktype type descriptor that appears later in the file.

Sequence operations

Functions for handling sequence type operations come next. Stacks respond to most sequence operators: len, +, *, and [i]. Much like the __getitem__ class method, the stack_item indexing handler performs indexing, but also in membership tests and for iterator loops. These latter two work by indexing an object until an IndexError exception is caught by Python.

Type descriptors

The type descriptor tables (really, structs) that appear near the end of the file are the crux of the matter for types -- Python uses these tables to dispatch an operation performed on an instance object to the corresponding C handler function in this file. In fact, everything is routed through these tables; even method attribute lookups start by running a C stack_getattr function listed in the table (which in turn looks up the attribute name in a name/function-pointer table). The main Stacktype table includes a link to the supplemental stack_as_sequence table where sequence operation handlers are registered; types can provide such tables to register handlers for mapping, number, and sequence operation sets. See Python's integer and dictionary objects' source code for number and mapping examples; they are analogous to the sequence type here, but their operation tables vary.[5]

[5] Note that type descriptor layouts, like most C API tools, are prone to change over time, and you should always consult Include/object.h in the Python distribution for an up-to-date list of fields. Some new Python releases may also require that types written to work with earlier releases be recompiled to pick up descriptor changes. As always, see Python's extension manuals and its full source code distribution for more information and examples.

Constructor module

Besides defining a C type, this file also creates a simple C module at the end that exports a stacktype.Stack constructor function, which Python scripts call to generate new stack instance objects. The initialization function for this module is the only C name in this file that is not static (local to the file); everything else is reached by following pointers -- from instance, to type descriptor, to C handler function.

Again, see the book CD (see http://examples.oreilly.com/python2) for the full C stack type implementation. But to give you the general flavor of C type methods, here is what the C type's pop function looks like; compare this with the C module's pop function to see how the self argument is used to access per-instance information in types:

static PyObject *
stack_pop(self, args)
    stackobject *self;
    PyObject *args;                            /* on "instance.pop()" */
{
    PyObject *pstr;
    if (!PyArg_ParseTuple(args, ""))           /* verify no args passed */
        return NULL;
    if (self->top == 0)
        onError("stack underflow")             /* return NULL = raise */
    else {
        pstr = Py_BuildValue("s", self->stack[--self->top]);
        self->len -= (strlen(self->stack[self->top]) + 1);
        return pstr;
    }
}

19.7.2 Compiling and Running

This C extension file is compiled and dynamically or statically linked like previous examples; file makefile.stack on the CD (see http://examples.oreilly.com/python2) handles the build like this:

stacktype.so: stacktyp.c
	gcc stacktyp.c -g -I$(PY)/Include -I$(PY) -fpic -shared -o stacktype.so

Once compiled, you can import the C module and make and use instances of the C type it defines much as if it were a Python class (but without inheritance). You would normally do this from a Python script, but the interactive prompt is a convenient place to test the basics:

[mark@toy ~/.../PP2E/Integrate/Extend/Stacks]$ python
>>> import stacktype                            # import C constructor module
>>> x = stacktype.Stack(  )                       # make C type instance object
>>> x.push('new')                               # call C type methods
>>> x                                           # call C type print handler
[Stack:
0: 'new'
]

>>> x[0]                                        # call C type index handler
'new'
>>> y = stacktype.Stack(  )                       # make another type instance
>>> for c in 'SPAM': y.push(c)                  # a distinct stack object
...
>>> y
[Stack:
3: 'M'
2: 'A'
1: 'P'
0: 'S'
]

>>> z = x + y                                   # call C type concat handler
>>> z
[Stack:
4: 'M'
3: 'A'
2: 'P'
1: 'S'
0: 'new'
]

>>> y.pop(  )
'M'
>>> len(z), z[0], z[-1]                         # for loops work too (indexing)
(5, 'new', 'M')

19.7.3 Timing the C Implementations

So how did we do on the optimization front this time? Let's resurrect that timer module we wrote back in Example 17-6 to compare the C stack module and type to the Python stack module and classes we coded in Chapter 17. Example 19-16 calculates the system time in seconds that it takes to run tests on all of this book's stack implementations.

Example 19-16. PP2E\Integrate\Extend\Stacks\exttime.py
#!/usr/local/bin/python
# time the C stack module and type extensions
# versus the object chapter's Python stack implementations

from PP2E.Dstruct.Basic.timer  import test      # second count function
from PP2E.Dstruct.Basic import stack1           # python stack module 
from PP2E.Dstruct.Basic import stack2           # python stack class: +/slice
from PP2E.Dstruct.Basic import stack3           # python stack class: tuples
from PP2E.Dstruct.Basic import stack4           # python stack class: append/pop
import stackmod, stacktype                      # c extension type, module

from sys import argv
rept, pushes, pops, items = 200, 200, 200, 200  # default: 200 * (600 ops)
try:
    [rept, pushes, pops, items] = map(int, argv[1:])
except: pass
print 'reps=%d * [push=%d+pop=%d+fetch=%d]' % (rept, pushes, pops, items)

def moduleops(mod):
    for i in range(pushes): mod.push('hello')   # strings only for C
    for i in range(items):  t = mod.item(i)
    for i in range(pops):   mod.pop(  )

def objectops(Maker):                           # type has no init args
    x = Maker(  )                                 # type or class instance
    for i in range(pushes): x.push('hello')     # strings only for C
    for i in range(items):  t = x[i]
    for i in range(pops):   x.pop(  )

# test modules: python/c
print "Python module:", test(rept, moduleops, stack1)
print "C ext module: ", test(rept, moduleops, stackmod), '\n'

# test objects: class/type
print "Python simple Stack:", test(rept, objectops, stack2.Stack)  
print "Python tuple  Stack:", test(rept, objectops, stack3.Stack)
print "Python append Stack:", test(rept, objectops, stack4.Stack)
print "C ext type Stack:   ", test(rept, objectops, stacktype.Stack)    

Running this script on Linux produces the following results. As we saw before, the Python tuple stack is slightly better than the Python in-place append stack in typical use (when the stack is only pushed and popped), but it is slower when indexed. The first test here runs 200 repetitions of 200 stack pushes and pops, or 80,000 stack operations (200 x 400); times listed are test duration seconds:

[mark@toy ~/.../PP2E/Integrate/Extend/Stacks]$ python exttim.py 200 200 200 0
reps=200 * [push=200+pop=200+fetch=0]
Python module: 2.09
C ext module:  0.68

Python simple Stack: 2.15
Python tuple  Stack: 0.68
Python append Stack: 1.16
C ext type Stack:    0.5

[mark@toy ~/.../PP2E/Integrate/Extend/Stacks]$ python exttim.py 100 300 300 0
reps=100 * [push=300+pop=300+fetch=0]
Python module: 1.86
C ext module:  0.52

Python simple Stack: 1.91
Python tuple  Stack: 0.51
Python append Stack: 0.87
C ext type Stack:    0.38

At least when there are no indexing operations on the stack as in these two tests (just pushes and pops), the C type is only slightly faster than the best Python stack (tuples). In fact, it's almost a draw -- in these first two tests, the C type reports only a tenth of a second speedup after 200 stacks and 80,000 stack operations. It's not exactly the kind of performance difference that would generate a bug report.[6]

[6] Interestingly, Python has gotten much faster since this book's first edition, relative to C. Back then, the C type was still almost three times faster than the best Python stack (tuples) when no indexing was performed. Today, it's almost a draw. One might infer from this that C migrations have become a third as important as they once were.

The C module comes in at roughly three times faster than the Python module, but these results are flawed. The stack1 Python module tested here uses the same slow stack implementation as the Python "simple" stack (stack2). If it was recoded to use the tuple stack representation used in Chapter 17, its speed would be similar to the "tuple" figures listed here, and almost identical to the speed of the C module in the first two tests:

[mark@toy ~/.../PP2E/Integrate/Extend/Stacks]$ python exttim.py 200 200 200 50
reps=200 * [push=200+pop=200+fetch=50]
Python module: 2.17
C ext module:  0.79

Python simple Stack: 2.24
Python tuple  Stack: 1.94
Python append Stack: 1.25
C ext type Stack:    0.52

[mark@toy ~/.../PP2E/Integrate/Extend/Stacks]$ python exttim.py
reps=200 * [push=200+pop=200+fetch=200]
Python module: 2.42
C ext module:  1.1

Python simple Stack: 2.54
Python tuple  Stack: 19.09
Python append Stack: 1.54
C ext type Stack:    0.63

But under the different usage patterns simulated in these two tests, the C type wins the race. It is about twice as fast as the best Python stack (append) when indexing is added to the test mix, as illustrated by two of the preceding test runs that ran with a nonzero fetch count. Similarly, the C module would be twice as fast as the best Python module coding in this case as well.

In other words, the fastest Python stacks are as good as the C stacks if you stick to pushes and pops, but the C stacks are roughly twice as fast if any indexing is performed. Moreover, since you have to pick one representation, if indexing is possible at all you would likely pick the Python append stack; assuming they represent the best case, C stacks would always be twice as fast.

Of course, the measured time differences are so small that in many applications you won't care. Further, the C stacks are much more difficult to program, and achieve their speed by imposing substantial functional limits; in many ways, this is not quite an apples-to-apples comparison. But as a rule of thumb, C extensions can not only integrate existing components for use in Python scripts, they can also optimize time-critical components of pure Python programs. In other scenarios, migration to C might yield an even larger speedup.

On the other hand, C extensions should generally be used only as a last resort. As we learned earlier, algorithms and data structures are often bigger influences on program performance than implementation language. The fact that Python-coded tuple stacks are just as fast as the C stacks under common usage patterns speaks volumes about the importance of data structure representation.

19.7.4 Wrapping C Types in Classes

In the current Python implementation, to add inheritance to C types you must have a class somewhere. The most common way to support type customization is to introduce a wrapper class -- a Python class that does little but keep a reference to a type object and pass all operations off to the type. Because such a wrapper adds a class interface on top of the type, though, it allows the underlying type to be subclassed and extended as though the type was a class. This is illustrated in Example 19-17.

Example 19-17. PP2E\Integrate\Extend\Stacks\oopstack.py
import stacktype                                # get the C type/module
class Stack:
    def __init__(self, start=None):             # make/wrap a C type-instance
        self._base = start or stacktype.Stack(  ) # deleted when class-instance is
    def __getattr__(self, name):
        return getattr(self._base, name)        # methods/members: type-instance
    def __cmp__(self, other):
        return cmp(self._base, other)
    def __repr__(self):                         # 'print' is not really repr
        print self._base,; return ''
    def __add__(self, other):                   # operators: special methods
        return Stack(self._base + other._base)  # operators are not attributes
    def __mul__(self, n): 
        return Stack(self._base * n)            # wrap result in a new Stack
    def __getitem__(self, i): 
        return self._base[i]                    # 'item': index, in, for
    def __len__(self):
        return len(self._base)

This wrapper class can be used the same as the C type, because it delegates all operations to the type instance stored away in the class instance's self._base:

[mark@toy ~/.../PP2E/Integrate/Extend/Stacks]$ python
>>> import oopstack
>>> x = oopstack.Stack(  )
>>> y = oopstack.Stack(  )
>>> x.push('class')
>>> for c in "SPAM": y.push(c)
...
>>> x
[Stack:
0: 'class'
]

>>> y[2]
'A'
>>> z = x + y
>>> for s in z: print s,
...
class S P A M

>>> z.__methods__, z.__members__, z.pop(  )
(['empty', 'pop', 'push', 'top'], ['len'], 'M')
>>> type(z), type(z._base)
(<type 'instance'>, <type 'stack'>)

The point of coding such a wrapper is to better support extensions in Python. Subclasses really subclass the wrapper class, but because the wrapper is just a thin interface to the type, it's like subclassing the type itself, as in Example 19-18.

Example 19-18. PP2E\Integrate\Extend\Stacks\substack.py
from oopstack import Stack              # get the 'stub' class (C-type wrapper)

class Substack(Stack):
    def __init__(self, start=[]):       # extend the 'new' operation
        Stack.__init__(self)            # initialize stack from any sequence
        for str in start:               # start can be another stack too
            self.push(str)
    def morestuff(self):                # add a new method
        print 'more stack stuff'
    def __getitem__(self, i):           # extend 'item' to trace accesses
        print 'accessing cell', i
        return Stack.__getitem__(self, i)

This subclass extends the type (wrapper) to support an initial value at construction time, prints trace messages when indexed, and introduces a brand new morestuff method. This subclass is limited (e.g., the result of a + is a Stack, not a Substack), but proves the point -- wrappers let you apply inheritance and composition techniques we've met in this book to new types coded in C:

>>> import substack
>>> a = substack.Substack(x + y)
>>> a
[Stack:
4: 'M'
3: 'A'
2: 'P'
1: 'S'
0: 'class'
]

>>> a[3]
accessing cell 3
'A'
>>> a.morestuff(  )
more stack stuff
>>> b = substack.Substack("C" + "++")
>>> b.pop(), b.pop(  )
('+', '+')
>>> c = b + substack.Substack(['-', '-'])
>>> for s in c: print s,
...
C - -


19.7.5 But Don't Do That Either -- SWIG

You can code C types manually like this, but you don't necessarily have to. Because SWIG knows how to generate glue code for C++ classes, you can instead automatically generate all the C extension and wrapper class code required to integrate such a stack object, simply by running SWIG over an appropriate class declaration. The next section shows how.

    I l@ve RuBoard Previous Section Next Section