I l@ve RuBoard Previous Section Next Section

2.3 System Scripting Overview

The next two sections will take a quick tour through sys and os, before this chapter moves on to larger system programming concepts. As I'm not going to demonstrate every item in every built-in module, the first thing I want to do is show you how to get more details on your own. Officially, this task also serves as an excuse for introducing a few core system scripting concepts -- along the way, we'll code a first script to format documentation.

2.3.1 Python System Modules

Most system-level interfaces in Python are shipped in just two modules: sys and os. That's somewhat oversimplified; other standard modules belong to this domain too (e.g., glob, socket, thread, time, fcntl), and some built-in functions are really system interfaces as well (e.g., open). But sys and os together form the core of Python's system tools arsenal.

In principle at least, sys exports components related to the Python interpreter itself (e.g., the module search path), and os contains variables and functions that map to the operating system on which Python is run. In practice, this distinction may not always seem clear-cut (e.g., the standard input and output streams show up in sys, but they are at least arguably tied to operating system paradigms). The good news is that you'll soon use the tools in these modules so often that their locations will be permanently stamped on your memory.[1]

[1] They may also work their way into your subconscious. Python newcomers sometimes appear on Internet discussion forums to express joy after "dreaming in Python" for the first time. All possible Freudian interpretations aside, it's not bad as dream motifs go; after all, there are worse languages to dream in.

The os module also attempts to provide a portable programming interface to the underlying operating system -- its functions may be implemented differently on different platforms, but they look the same everywhere to Python scripts. In addition, the os module exports a nested submodule, os.path, that provides a portable interface to file and directory processing tools.

2.3.2 Module Documentation Sources

As you can probably deduce from the preceding paragraphs, learning to write system scripts in Python is mostly a matter of learning about Python's system modules. Luckily, there are a variety of information sources to make this task easier -- from module attributes to published references and books.

For instance, if you want to know everything that a built-in module exports, you can either read its library manual entry, study its source code (Python is open source software, after all), or fetch its attribute list and documentation string interactively. Let's import sys and see what it's got:

C:\...\PP2E\System> python
>>> import sys
>>> dir(sys)
['__doc__', '__name__', '__stderr__', '__stdin__', '__stdout__', 'argv',
'builtin_module_names', 'copyright', 'dllhandle', 'exc_info', 'exc_type',
'exec_prefix', 'executable', 'exit', 'getrefcount', 'hexversion', 'maxint',
'modules', 'path', 'platform', 'prefix', 'ps1', 'ps2', 'setcheckinterval',
'setprofile', 'settrace', 'stderr', 'stdin', 'stdout', 'version', 'winver']

The dir function simply returns a list containing the string names of all the attributes in any object with attributes; it's a handy memory-jogger for modules at the interactive prompt. For example, we know there is something called sys.version, because the name version came back in the dir result. If that's not enough, we can always consult the __doc__ string of built-in modules:

>>> sys.__doc__ 
 ...lots of text deleted here...
count for an object (plus one :-)\012setcheckinterval(  ) -- control how often 
the interpreter checks for events\012setprofile(  ) -- set the global profiling
function\012settrace(  ) -- set the global debug tracing function\012"

2.3.3 Paging Documentation Strings

The __doc__ built-in attribute usually contains a string of documentation, but may look a bit weird when printed -- it's one long string with embedded line-feed characters that print as \012, not a nice list of lines. To format these strings for more humane display, I usually use a utility script like the one in Example 2-1.

Example 2-1. PP2E\System\more.py
# split and interactively page a string or file of text;

import string

def more(text, numlines=15):
    lines = string.split(text, '\n')
    while lines:
        chunk = lines[:numlines]
        lines = lines[numlines:]
        for line in chunk: print line
        if lines and raw_input('More?') not in ['y', 'Y']: break 

if __name__ == '__main__':
    import sys                             # when run, not imported
    more(open(sys.argv[1]).read(  ), 10)     # page contents of file on cmdline

The meat of this file is its more function, and if you know any Python at all, it should be fairly straightforward -- it simply splits up a string around end-of-line characters, and then slices off and displays a few lines at a time (15 by default) to avoid scrolling off the screen. A slice expression lines[:15] gets the first 15 items in a list, and lines[15:] gets the rest; to show a different number of lines each time, pass a number to the numlines argument (e.g., the last line in Example 2-1 passes 10 to the numlines argument of the more function).

The string.split built-in call this script employs returns a list of sub-strings (e.g., ["line", "line",...]). As we'll see later in this chapter, the end-of-line character is always \n (which is \012 in octal escape form) within a Python script, no matter what platform it is run upon. (If you don't already know why this matters, DOS \r characters are dropped when read.)

2.3.4 Introducing the string Module

Now, this is a simple Python program, but it already brings up three important topics that merit quick detours here: it uses the string module, reads from a file, and is set up to be run or imported. The Python string module isn't a system-related tool per se, but it sees action in most Python programs. In fact, it is going to show up throughout this chapter and those that follow, so here is a quick review of some of its more useful exports. The string module includes calls for searching and replacing:

>>> import string
>>> string.find('xxxSPAMxxx', 'SPAM')            # return first offset
>>> string.replace('xxaaxxaa', 'aa', 'SPAM')     # global replacement

>>> string.strip('\t  Ni\n')                     # remove whitespace

The string.find call returns the offset of the first occurrence of a substring, and string.replace does global search and replacement. With this module, substrings are just strings; in Chapter 18, we'll also see modules that allow regular expression patterns to show up in searches and replacements. The string module also provides constants and functions useful for things like case conversions:

>>> string.lowercase                             # case constants, converters

>>> string.lower('SHRUBBERRY')

There are also tools for splitting up strings around a substring delimiter and putting them back together with a substring between. We'll explore these tools later in this book, but as an introduction, here they are at work:

>>> string.split('aaa+bbb+ccc', '+')             # split into substrings list
['aaa', 'bbb', 'ccc']
>>> string.split('a b\nc\nd')                    # default delimiter: whitespace
['a', 'b', 'c', 'd']

>>> string.join(['aaa', 'bbb', 'ccc'], 'NI')     # join substrings list
>>> string.join(['A', 'dead', 'parrot'])         # default delimiter: space
'A dead parrot'

These calls turn out to be surprisingly powerful. For example, a line of data columns separated by tabs can be parsed into its columns with a single split call; the more.py script uses it to split a string into a list of line strings. In fact, we can emulate the string.replace call with a split/join combination:

>>> string.join(string.split('xxaaxxaa', 'aa'), 'SPAM')   # replace the hard way

For future reference, also keep in mind that Python doesn't automatically convert strings to numbers, or vice versa; if you want to use one like the other, you must say so, with manual conversions:

>>> string.atoi("42"), int("42"), eval("42")     # string to int conversions
(42, 42, 42)

>>> str(42), `42`, ("%d" % 42)                   # int to string conversions
('42', '42', '42')

>>> "42" + str(1), int("42") + 1                 # concatenation, addition
('421', 43)

In the last command here, the first expression triggers string concatenation (since both sides are strings) and the second invokes integer addition (because both objects are numbers). Python doesn't assume you meant one or the other and convert automatically; as a rule of thumb, Python tries to avoid magic when possible. String tools will be covered in more detail later in this book (in fact, they get a full chapter in Part IV), but be sure to also see the library manual for additional string module tools.

As of Python 1.6, string objects have grown methods corresponding to functions in the string module. For instance, given a name X assigned to a string object, X.split( ) now does the same work as string.split(X). In Example 2-1, that means that these two lines would be equivalent:

lines = string.split(text, '\n')
lines = text.split('\n')

but the latter form doesn't require an import statement. The string module will still be around for the foreseeable future and beyond, but string methods are likely to be the next wave in the Python text-processing world.

2.3.5 File Operation Basics

The more.py script also opens the external file whose name is listed on the command line with the built-in open function, and reads its text into memory all at once with the file object read method. Since file objects returned by open are part of the core Python language itself, I assume that you have at least a passing familiarity with them at this point in the text. But just in case you've flipped into this chapter early on in your Pythonhood, the calls:

open('file').read(  )            # read entire file into string 
open('file').read(N)           # read next N bytes into string 
open('file').readlines(  )       # read entire file into line strings list
open('file').readline(  )        # read next line, through '\n'

load a file's contents into a string, load a fixed size set of bytes into a string, load a file's contents into a list of line strings, and load the next line in the file into a string, respectively. As we'll see in a moment, these calls can also be applied to shell commands in Python. File objects also have write methods for sending strings to the associated file. File-related topics are covered in depth later in this chapter.

2.3.6 Using Programs Two Ways

The last few lines in the more.py file also introduce one of the first big concepts in shell tool programming. They instrument the file to be used two ways: as script or library. Every Python module has a built-in __name__ variable that is set by Python to the string __main__ only when the file is run as a program, not when imported as a library. Because of that, the more function in this file is executed automatically by the last line in the file when this script is run as a top-level program, not when it is imported elsewhere. This simple trick turns out to be one key to writing reusable script code: by coding program logic as functions instead of top-level code, it can also be imported and reused in other scripts.

The upshot is that we can either run more.py by itself, or import and call its more function elsewhere. When running the file as a top-level program, we list the name of a file to be read and paged on the command line: as we'll describe fully later in this chapter, words typed in the command used to start a program show up in the built-in sys.argv list in Python. For example, here is the script file in action paging itself (be sure to type this command line in your PP2E\System directory, or it won't find the input file; I'll explain why later):

C:\...\PP2E\System>python more.py more.py
# split and interactively page a string or file of text;

import string

def more(text, numlines=15):
    lines = string.split(text, '\n')
    while lines:
        chunk = lines[:numlines]
        lines = lines[numlines:]
        for line in chunk: print line
        if lines and raw_input('More?') not in ['y', 'Y']: break

if __name__ == '__main__':
    import sys                             # when run, not imported
    more(open(sys.argv[1]).read(  ), 10)     # page contents of file on cmdline

When the more.py file is imported, we pass an explicit string to its more function, and this is exactly the sort of utility we need for documentation text. Running this utility on the sys module's documentation string gives us a bit more information about what's available to scripts, in human-readable form:

>>> from more import more
>>> more(sys.__doc__)
This module provides access to some objects used or maintained by the
interpreter and to functions that interact strongly with the interpreter.

Dynamic objects:

argv -- command line arguments; argv[0] is the script pathname if known
path -- module search path; path[0] is the script directory, else ''
modules -- dictionary of loaded modules
exitfunc -- you may set this to a function to be called when Python exits

stdin -- standard input file object; used by raw_input(  ) and input(  )
stdout -- standard output file object; used by the print statement
stderr -- standard error object; used for error messages
  By assigning another file object (or an object that behaves like a file)
  to one of these, it is possible to redirect all of the interpreter's I/O.

Pressing "y" (and the Enter key) here makes the function display the next few lines of documentation, and then prompt again unless you've run past the end of the lines list. Try this on your own machine to see what the rest of the module's documentation string looks like.

2.3.7 Python Library Manuals

If that still isn't enough detail, your next step is to read the Python library manual's entry for sys to get the full story. All of Python's standard manuals ship as HTML pages, so you should be able to read them in any web browser you have on your computer. They are available on this book's CD (view CD-ROM content online at http://examples.oreilly.com/python2), and are installed with Python on Windows, but here are a few simple pointers:

  • On Windows, click the Start button, pick Programs, select the Python entry there, and then choose the manuals item. The manuals should magically appear on your display within a browser like Internet Explorer.

  • On Linux, you may be able to click on the manuals' entries in a file explorer, or start your browser from a shell command line and navigate to the library manual's HTML files on your machine.

  • If you can't find the manuals on your computer, you can always read them online. Go to Python's web site, http://www.python.org, and follow the documentation links.

However you get started, be sure to pick the "Library" manual for things like sys; Python's standard manual set also includes a short tutorial, language reference, extending references, and more.

2.3.8 Commercially Published References

At the risk of sounding like a marketing droid, I should mention that you can also purchase the Python manual set, printed and bound; see the book information page at http://www.python.org for details and links. Commercially published Python reference books are also available today, including Python Essential Reference (New Riders Publishing) and Python Pocket Reference (O'Reilly). The former is more complete and comes with examples, but the latter serves as a convenient memory-jogger once you've taken a library tour or two.[2] Also watch for O'Reilly's upcoming book Python Standard Library.

[2] I also wrote the latter as a replacement for the reference appendix that appeared in the first edition of this book; it's meant to be a supplement to the text you're reading. Since I'm its author, though, I won't say more here . . . except that you should be sure to pick up a copy for friends, coworkers, old college roommates, and every member of your extended family the next time you're at the bookstore (yes, I'm kidding).

    I l@ve RuBoard Previous Section Next Section