Previous Page
Next Page

8.4. The copy Module

As discussed in "Assignment Statements" on page 47, assignment in Python does not copy the righthand side object being assigned. Rather, assignment adds a reference to the righthand side object. When you want a copy of object x, you can ask x for a copy of itself, or you can ask x's type to make a new instance copied from x. If x is a list, list(x) returns a copy of x, as does x[:]. If x is a dictionary, dict(x) and x.copy( ) return a copy of x. If x is a set, set(x) and x.copy( ) return a copy of x. In each of these cases, my strong personal preference is for the uniform and readable idiom of calling the type, but there is no consensus on this style issue in the community of Python experts.

The copy module supplies a copy function to create and returns a copy of many types of objects. Normal copies, such as list(x) for a list x and copy.copy(x), are also known as shallow copies: when x has references to other objects (as items or attributes), a normal (shallow) copy of x has distinct references to the same objects. Sometimes, however, you need a deep copy, where referenced objects are copied recursively; fortunately, this requirement is rare, because a deep copy can take a lot of memory and time. Module copy supplies a deepcopy(x) function to create and return a deep copy.



Creates and returns a shallow copy of x, for x of many types (copies of several types, such as modules, classes, files, frames, and other internal types, are, however, not supported). If x is immutable, copy.copy(x) may return x itself as an optimization. A class can customize the way copy.copy copies its instances by having a special method _ _copy_ _(self) that returns a new object, a copy of self.



Makes a deep copy of x and returns it. Deep copying implies a recursive walk over a directed (not necessarily acyclic) graph of references. A special precaution is needed to preserve the graph's shape: when references to the same object are met more than once during the walk, distinct copies must not be made. Rather, references to the same copied object must be used. Consider the following simple example:

sublist = [1,2]
original = [sublist, sublist]
thecopy = copy.deepcopy(original)

original[0] is original[1] is true (i.e., the two items of list original refer to the same object). This is an important property of original and therefore must be preserved in anything that claims to be a copy of it. The semantics of copy.deepcopy are defined to ensure that thecopy[0] is thecopy[1] is also true in this case. In other words, the shapes of the graphs of references of original and thecopy are the same. Avoiding repeated copying has an important beneficial side effect: it prevents infinite loops that would otherwise occur when the graph of references has cycles.

copy.deepcopy accepts a second, optional argument memo, which is a dictionary that maps the id( ) of objects already copied to the new objects that are their copies. memo is passed by all recursive calls of deepcopy to itself, and you may also explicitly pass it (normally as an originally empty dictionary) if you need to maintain such a correspondence map between the identities of originals and copies of objects.

A class can customize the way copy.deepcopy copies its instances by having a special method _ _deepcopy_ _(self, memo) that returns a new object, a deep copy of self. When _ _deepcopy_ _ needs to deep copy some referenced object subobject, it must do so by calling copy.deepcopy(subobject, memo). When a class has no special method _ _deepcopy_ _, copy.deepcopy on an instance of that class also tries to call special methods _ _getinitargs_ _, _ _getnewargs_ _, _ _getstate_ _, and _ _setstate_ _, which are covered in "Pickling instances" on page 282.

Previous Page
Next Page