8.5. The collections Module
The collections module (introduced in Python 2.4) is intended to eventually supply several interesting types that are collections (i.e., containers).
In Python 2.4, the collections module supplies only one type, deque, whose instances are "double-ended queues" (i.e., sequence-like containers suitable for additions and removals at both ends). Call deque with a single argument, any iterable, to obtain a new deque instance whose items are those of the iterable in the same order, or call deque without arguments to obtain a new empty deque instance. A deque instance d is a mutable sequence and thus can be indexed and iterated on (however, d cannot be sliced, only indexed one item at a time, whether for access, rebinding, or deletion). A deque instance d supplies the following methods.
In Python 2.5, the collections module also supplies type defaultdict. defaultdict subclasses dict and adds one per-instance attribute, named default_factory. When an instance d of defaultdict has None as the value of d.default_factory, d behaves exactly like a normal dictionary. Otherwise, d.default_factory must be callable without arguments, and d behaves like a normal dictionary except when d is indexed with a key k that is not in d. In this specific case, the indexing d[k] calls d.default_factory( ), assigns the result as the value of d[k], and returns the result. In other words, type defaultdict behaves like the following Python-coded class:
class defaultdict(dict): def _ _init_ _(self, default_factory, *a, **k): dict._ _init_ _(self, *a, **k) self.default_factory = default_factory def _ _getitem_ _(self, key): if key not in self and self.default_factory is not None: self[key] = self.default_factory( ) return dict._ _getitem_ _(self, key)
As this Python equivalent implies, to instantiate defaultdict you pass it an extra first argument (before any other arguments, positional and/or named, if any, to be passed on to plain dict). That extra first argument becomes the initial value of default_factory. Other behavior of defaultdict is also essentially as implied by this Python equivalent (with the exception of str and repr, which return strings that are different from those they would return for a plain dict). You can read and rebind default_factory, other methods such as get and pop are not affected, all behavior related to the set of keys (method keys, iteration, membership test via operator in, etc.) reflects exactly the keys that are currently in the container (whether you put them there explicitly, or implicitly via indexing calling default_factory), and so forth.
A typical use of a defaultdict is as a bag (also known as a multiset), a dictionary-like object that keeps a count that corresponds to each key; for such a use, the natural choice for default_factory is int, since calling int( ) returns 0 (the correct count for a key that is not yet in the bag). For example:
import collections, operator def sorted_histogram(seq): d = collections.defaultdict(int) for item in seq: d[item] += 1 return sorted(d.iteritems, key=operator.itemgetter(1), reverse=True)
Called with any iterable whose items are hashable, this sorted_histogram function returns a list of pairs, each of the form (item,count), giving all distinct items in the iterable, each associated with its number of occurrences, sorted in reverse order of counts (items seen most often come first). Another typical use of defaultdict is to set default_factory to list, to make a mapping from keys to lists of values:
def multi_dict(items): d = collections.defaultdict(list) for key, value in items: d[key].append(value) return d
Called with any iterable whose items are pairs of the form (key,value), with all keys being hashable, this multi_dict function returns a mapping that associates each key to the lists of values that accompanied it in the iterable (if you want a pure dict result, change the last statement into return dict(d); but this is generally not necessary).