Previous Page
Next Page

7.2. Module Loading

Module-loading operations rely on attributes of the built-in sys module (covered in "The sys Module" on page 168). The module-loading process described in this section is carried out by built-in function _ _import_ _. Your code can call _ _import_ _ directly, with the module name string as an argument. _ _import_ _ returns the module object or raises ImportError if the import fails.

To import a module named M, _ _import_ _ first checks dictionary sys.modules, using string M as the key. When key M is in the dictionary, _ _import_ _ returns the corresponding value as the requested module object. Otherwise, _ _import_ _ binds sys.modules[M] to a new empty module object with a _ _name_ _ of M, then looks for the right way to initialize (load) the module, as covered in "Searching the Filesystem for a Module" on page 144.

Thanks to this mechanism, the relatively slow loading operation takes place only the first time a module is imported in a given run of the program. When a module is imported again, the module is not reloaded, since _ _import_ _ rapidly finds and returns the module's entry in sys.modules. Thus, all imports of a given module after the first one are very fast: they're just dictionary lookups. (To force a reload, see "The reload Function" on page 146.)

7.2.1. Built-in Modules

When a module is loaded, _ _import_ _ first checks whether the module is built-in. Built-in modules are listed in tuple sys.builtin_module_names, but rebinding that tuple does not affect module loading. When Python loads a built-in module, as when it loads any other extension, Python calls the module's initialization function. The search for built-in modules also looks for modules in platform-specific locations, such as resource forks and frameworks on the Mac, and the Registry in Windows.

7.2.2. Searching the Filesystem for a Module

If module M is not built-in, _ _import_ _ looks for M's code as a file on the filesystem. _ _import_ _ looks at the strings, which are the items of list sys.path, in order. Each item is the path of a directory, or the path of an archive file in the popular ZIP format. sys.path is initialized at program startup, using environment variable PYTHONPATH (covered in "Environment Variables" on page 22), if present. The first item in sys.path is always the directory from which the main program (script) is loaded. An empty string in sys.path indicates the current directory.

Your code can mutate or rebind sys.path, and such changes affect which directories and ZIP archives _ _import_ _ searches to load modules. Changing sys.path does not affect modules that are already loaded (and thus already recorded in sys.modules) when you change sys.path.

If a text file with extension .pth is found in the PYTHONHOME directory at startup, the file's contents are added to sys.path, one item per line. .pth files can contain blank lines and comment lines starting with the character #; Python ignores any such lines. .pth files can also contain import statements, which Python executes before your program starts to execute, but no other kinds of statements.

When looking for the file for module M in each directory and ZIP archive along sys.path, Python considers the following extensions in the order listed:

  1. .pyd and .dll (Windows) or .so (most Unix-like platforms), which indicate Python extension modules. (Some Unix dialects use different extensions; e.g., .sl is the extension used on HP-UX.) On most platforms, extensions cannot be loaded from a ZIP archiveonly pure source or bytecode-compiled Python modules can.

  2. .py, which indicates pure Python source modules.

  3. .pyc (or .pyo, if Python is run with option -O), which indicates bytecode-compiled Python modules.

One last path at which Python looks for the file for module M is M/_ _init_ _.py, meaning a file named _ _init_ _.py in a directory named M, as covered in "Packages" on page 149.

Upon finding source file M.py, Python compiles it to M.pyc (or M.pyo), unless the bytecode file is already present, is newer than M.py, and was compiled by the same version of Python. If M.py is compiled from a writable directory, Python saves the bytecode file to the filesystem in the same directory so that future runs will not needlessly recompile. When the bytecode file is newer than the source file (based on an internal timestamp in the bytecode file, not on trusting the date as recorded in the filesystem), Python does not recompile the module.

Once Python has the bytecode file, either from having constructed it by compilation or by reading it from the filesystem, Python executes the module body to initialize the module object. If the module is an extension, Python calls the module's initialization function.

7.2.3. The Main Program

Execution of a Python application normally starts with a top-level script (also known as the main program), as explained in "The python Program" on page 22. The main program executes like any other module being loaded, except that Python keeps the bytecode in memory without saving it to disk. The module name for the main program is always _ _main_ _, both as the _ _name_ _ global variable (module attribute) and as the key in sys.modules. You should not normally import the same .py file that is in use as the main program. If you do, the module is loaded again, and the module body is executed once more from the top in a separate module object with a different _ _name_ _.

Code in a Python module can test whether the module is being used as the main program by checking if global variable _ _name_ _ equals '_ _main_ _'. The idiom:

if _ _name_ _=='_ _main_ _':

is often used to guard some code so that it executes only when the module is run as the main program. If a module is designed only to be imported, it should normally execute unit tests when it is run as the main program, as covered in "Unit Testing and System Testing" on page 452.

7.2.4. The reload Function

Python loads a module only the first time you import the module during a program run. When you develop interactively, you need to make sure your modules are reloaded each time you edit them (some development environments provide automatic reloading).

To reload a module, pass the module object (not the module name) as the only argument to built-in function reload. reload(M) ensures the reloaded version of M is used by client code that relies on import M and accesses attributes with the syntax M.A. However, reload(M) has no effect on other existing references bound to previous values of M's attributes (e.g., with a from statement). In other words, already bound variables remain bound as they were, unaffected by reload. reload's inability to rebind such variables is a further incentive to avoid from in favor of import.

reload is not recursive: when you reload a module M, this does not imply that other modules imported by M get reloaded in turn. You must specifically arrange to reload, by explicit calls to the reload function, each and every module you have modified.

7.2.5. Circular Imports

Python lets you specify circular imports. For example, you can write a module a.py that contains import b, while module b.py contains import a. In practice, you are typically better off avoiding circular imports, since circular dependencies are always fragile and hard to manage. If you decide to use a circular import for some reason, you need to understand how circular imports work in order to avoid errors in your code.

Say that the main script executes import a. As discussed earlier, this import statement creates a new empty module object as sys.modules['a'], and then the body of module a starts executing. When a executes import b, this creates a new empty module object as sys.modules['b'], and then the body of module b starts executing. The execution of a's module body suspends until b's module body finishes.

Now, when b executes import a, the import statement finds sys.modules['a'] already defined, and therefore binds global variable a in module b to the module object for module a. Since the execution of a's module body is currently suspended, module a may be only partly populated at this time. If the code in b's module body immediately tries to access some attribute of module a that is not yet bound, an error results.

If you do insist on keeping a circular import in some case, you must carefully manage the order in which each module defines its own globals, imports other modules, and accesses globals of other modules. You can have greater control on the sequence in which things happen by grouping your statements into functions, and calling those functions in a controlled order, rather than just relying on sequential execution of top-level statements in module bodies. However, removing circular dependencies is almost always easier than ensuring bomb-proof ordering while keeping such circular dependencies. Since circular dependencies are also bad for other reasons, I recommend striving to remove them.

7.2.6. sys.modules Entries

The built-in _ _import_ _ function never binds anything other than a module object as a value in sys.modules. However, if _ _import_ _ finds an entry that is already in sys.modules, it returns that value, whatever type of object it may be. The import and from statements rely on the _ _import_ _ function, so they too can end up using objects that are not modules. This lack of type-checking is an advanced feature that was introduced several Python versions ago (very old versions of Python used to type-check, allowing only module objects as values in sys.modules). The feature lets you set class instances as entries in sys.modules, in order to exploit features such as _ _getattr_ _ and _ _setattr_ _ special methods, covered in "General-Purpose Special Methods" on page 104. This advanced technique lets you import module-like objects whose attributes can be computed on the fly. Here's a toy-like example:

class TT(object):
    def _ _getattr_ _(self, name): return 23
import sys sys.modules[_ _name_ _] = TT( )

When you import this code as a module, you get a module-like object (which in fact is an instance of this class TT) that appears to have any attribute name you try to get from it; all attribute names correspond to the integer value 23.

7.2.7. Custom Importers

An advanced, rarely needed functionality that Python offers is the ability to change the semantics or some or all import and from statements.

7.2.7.1. Rebinding _ _import_ _

You can rebind the _ _import_ _ attribute of module _ _builtin_ _ to your own custom importer functionfor example, one built with the generic built-in-wrapping technique shown in "Python built-ins" on page 141. Such a rebinding influences all import and from statements that execute after the rebinding and thus has possibly undesired global impacts. A custom importer built by rebinding _ _import_ _ must implement the same interface as the built-in _ _import_ _, and, in particular, it is responsible for supporting the correct use of sys.modules. While rebinding _ _import_ _ may initially look like an attractive approach, in most cases where custom importers are necessary, you will be better off implementing them via import hooks instead.

7.2.7.2. Import hooks

Python offers rich support for selectively changing the details of imports, including standard library modules imp and ihooks. Custom importers are an advanced and rarely used technique, and yet some applications may need them for all sorts of purposes, including the import of code from archive files in formats that differ from ZIP files, from databases, from network servers, and so on. The most suitable approach for such highly advanced needs is to record importer factory callables as items in attributes meta_path and/or path_hooks of module sys, as detailed at http://www.python.org/peps/pep-0302.html. This approach is also the one that Python uses to hook up standard library module zipimport in order to allow seamless importing of modules from ZIP files, as previously mentioned. A full study of the details of PEP 302 is indispensable for any substantial use of sys.path_hooks and friends, but here's a simple, toy-level example that may help you understand the possibilities it offers, should you ever need them.

Suppose that while developing a first outline of some program, you want to be able to use import statements for modules that you haven't written yet, getting just warning messages (and empty modules) as a consequence. You can easily obtain such functionality (leaving aside the complexities connected with packages and dealing with simple modules only) by coding a custom importer module as follows:

import sys, new class ImporterAndLoader(object):
     '''importer and loader functionality is usually in a single class'''
     fake_path = '!dummy!'
     def _ _init_ _(self, path):
         '''we only handle our own fake-path marker'''
         if path != self.fake_path: raise ImportError
     def find_module(self, fullname):
         '''we don't even try to handle any qualified module-name'''
         if '.' in fullname: return None
         return self
     def load_module(self, fullname):
         # emit some kind of warning messages
         print 'NOTE: module %r not written yet' % fullname
         # make new empty module, put it in sys.modules too
         mod = sys.modules[fullname] = new.module(fullname)
         # minimally populate new module and then return it
         mod._ _file_ _ = 'dummy<%s>' % fullname
         mod._ _loader_ _ = self
         return mod
# add the class to the hook and its fake-path marker to the path sys.path_hooks.append(ImporterAndLoader)
sys.path.append(ImporterAndLoader.fake_path)

if _ _name_ _ == '_ _main_ _':      # self-test when run as main script
    import missing_module          # importing a simple missing module
    print sys.modules.get('missing_module')  #...should succeed
    # check that we don't deal with importing from packages
    try: import missing_module.sub_module
    except ImportError:
    else: print 'Unexpected:', sys.modules.get('missing_module.sub_module')


Previous Page
Next Page