Previous Page
Next Page

10.8. Filesystem Operations

Using the os module, you can manipulate the filesystem in a variety of ways: creating, copying, and deleting files and directories, comparing files, and examining filesystem information about files and directories. This section documents the attributes and methods of the os module that you use for these purposes, and covers some related modules that operate on the filesystem.

10.8.1. Path-String Attributes of the os Module

A file or directory is identified by a string, known as its path, whose syntax depends on the platform. On both Unix-like and Windows platforms, Python accepts Unix syntax for paths, with a slash (/) as the directory separator. On non-Unix-like platforms, Python also accepts platform-specific path syntax. On Windows, in particular, you may use a backslash (\) as the separator. However, you then need to double-up each backslash as \\ in string literals, or use raw-string syntax as covered in "Literals" on page 37; you also needlessly lose portability. Unix path syntax is handier, and usable everywhere, so I strongly recommend that you always use it. In the rest of this chapter, for brevity, I assume Unix path syntax in both explanations and examples.

Module os supplies attributes that provide details about path strings on the current platform. You should typically use the higher-level path manipulation operations covered in "The os.path Module" on page 246 rather than lower-level string operations based on these attributes. However, the attributes may be useful at times.


curdir

The string that denotes the current directory ('.' on Unix and Windows)


defpath

The default search path for programs, used if the environment lacks a PATH environment variable


linesep

The string that terminates text lines ('\n' on Unix; '\r\n' on Windows)


extsep

The string that separates the extension part of a file's name from the rest of the name ('.' on Unix and Windows)


pardir

The string that denotes the parent directory ('..' on Unix and Windows)


pathsep

The separator between paths in lists of paths, such as those used for the environment variable PATH (':' on Unix; ';' on Windows)


sep

The separator of path components ('/' on Unix; '\\' on Windows)

10.8.2. Permissions

Unix-like platforms associate nine bits with each file or directory: three each for the file's owner, its group, and anybody else, indicating whether the file or directory can be read, written, and executed by the given subject. These nine bits are known as the file's permission bits, and are part of the file's mode (a bit string that includes other bits that describe the file). These bits are often displayed in octal notation, with three bits in each digit. For example, mode 0664 indicates a file that can be read and written by its owner and group, and read, but not written, by anybody else. When any process on a Unix-like system creates a file or directory, the operating system applies to the specified mode a bit mask known as the process's umask, which can remove some of the permission bits.

Non-Unix-like platforms handle file and directory permissions in very different ways. However, the functions in Python's standard library that deal with file permissions accept a mode argument according to the Unix-like approach described in the previous paragraph. The implementation on each platform maps the nine permission bits in a way appropriate for the given platform. For example, on versions of Windows that distinguish only between read-only and read/write files and do not distinguish file ownership, a file's permission bits show up as either 0666 (read/write) or 0444 (read-only). On such a platform, when creating a file, the implementation looks only at bit 0200, making the file read/write if that bit is 0 or read-only if that bit is 1.

10.8.3. File and Directory Functions of the os Module

The os module supplies several functions to query and set file and directory status.

access

access(path, mode)

Returns true if file path has all of the permissions encoded in integer mode; otherwise, False. mode can be os.F_OK to test for file existence, or one or more of the constant integers named os.R_OK, os.W_OK, and os.X_OK (with the bitwise-OR operator | joining them, if more than one) to test permissions to read, write, and execute the file.

access does not use the standard interpretation for its mode argument, covered in "Permissions" on page 242. access tests only if this specific process's real user and group identifiers have the requested permissions on the file. If you need to study a file's permission bits in more detail, see function stat on page 244.

chdir

chdir(path)

Sets the current working directory to path.

chmod

chmod(path, mode)

Changes the permissions of file path, as encoded in integer mode. mode can be zero or more of os.R_OK, os.W_OK, and os.X_OK (with the bitwise-OR operator | joining them, if more than one) to set permission to read, write, and execute. On Unix-like platforms, mode can also be a richer bit pattern (as covered in "Permissions" on page 242) to specify different permissions for user, group, and other.

getcwd

getcwd( )

Returns the path of the current working directory.

listdir

listdir(path)

Returns a list whose items are the names of all files and subdirectories found in directory path. The returned list is in arbitrary order and does not include the special directory names '.' (current directory) and '..' (parent directory).

The dircache module also supplies a function named listdir, which works like os.listdir, with two enhancements. First, dircache.listdir returns a sorted list. Further, dircache caches the list it returns so that repeated requests for lists of the same directory are faster if the directory's contents have not changed in the meantime. dircache automatically detects changes: whenever you call dircache.listdir, you get a list that reflects the directory's contents at that time.

makedirs, mkdir

makedirs(path, mode=0777) mkdir(path, mode=0777)

makedirs creates all directories that are part of path and do not yet exist. mkdir creates only the rightmost directory of path and raises OSError if any of the previous directories in path do not exist. Both functions use mode as permission bits of directories they create. Both functions raise OSError if creation fails, or if a file or directory named path already exists.

remove, unlink

remove(path) unlink(path)

Removes the file named path (see rmdir on page 244 to remove a directory). unlink is a synonym of remove.

removedirs

removedirs(path)

Loops from right to left over the directories that are part of path, removing each one. The loop ends when a removal attempt raises an exception, generally because a directory is not empty. removedirs does not propagate the exception, as long as it has removed at least one directory.

rename

rename(source, dest)

Renames the file or directory named source to dest.

renames

renames(source, dest)

Like rename, except that renames tries to create all intermediate directories needed for dest. After renaming, renames tries to remove empty directories from path source using removedirs. It does not propagate any resulting exception; it's not an error if the starting directory of source does not become empty after the renaming.

rmdir

rmdir(path)

Removes the empty directory named path (raises OSError if the removal fails, and, in particular, if the directory is not empty).

stat

stat(path)

Returns a value x of type stat_result, which provides 10 items of information about file or subdirectory path. Accessing those items by their numeric indices is generally not advisable because the resulting code is not very readable; use the corresponding attribute names instead. Table 10-2 lists the attributes of a stat_result instance and the meaning of corresponding items.

Table 10-2. Items (attributes) of a stat_result instance

Item index

Attribute name

Meaning

0

st_mode

Protection and other mode bits

1

st_ino

Inode number

2

st_dev

Device ID

3

st_nlink

Number of hard links

4

st_uid

User ID of owner

5

st_gid

Group ID of owner

6

st_size

Size in bytes

7

st_atime

Time of last access

8

st_mtime

Time of last modification

9

st_ctime

Time of last status change


For example, to print the size in bytes of file path, you can use any of:

import os print os.path.getsize(path)
print os.stat(path)[6]
print os.stat(path).st_size

Time values are in seconds since the epoch, as covered in Chapter 12 (int on most platforms; float on very old versions of the Macintosh). Platforms unable to give a meaningful value for an item use a dummy value for that item.

tempnam, tmpnam

tempnam(dir=None, prefix=None) tmpnam( )

Returns an absolute path usable as the name of a new temporary file. Note: tempnam and tmpnam are weaknesses in your program's security. Avoid these functions and use instead the standard library module tempfile, covered in "The tempfile Module" on page 223.

utime

utime(path, times=None)

Sets the accessed and modified times of file or directory path. If times is None, utime uses the current time. Otherwise, times must be a pair of numbers (in seconds since the epoch, as covered in Chapter 12) in the order (accessed, modified).

walk

walk(top, topdown=TRue, onerror=None)

A generator yielding an item for each directory in the tree whose root is directory top. When topdown is true, the default, walk visits directories from the tree's root downward; when topdown is False, walk visits directories from the tree's leaves upward. When onerror is None, walk catches and ignores any OsError exception raised during the tree-walk. Otherwise, onerror must be a function; walk catches any OsError exception raised during the tree-walk and passes it as the only argument in a call to onerror, which may process it, ignore it, or raise it to terminate the tree-walk and propagate the exception.

Each item walk yields is a tuple of three subitems: dirpath, a string that is the directory's path; dirnames, a list of names of subdirectories that are immediate children of the directory (special directories '.' and '..' are not included); and filenames, a list of names of files that are directly in the directory. If topdown is true, you can alter list dirnames in-place, removing some items and/or reordering others, to affect the tree-walk of the subtree rooted at dirpath; walk iterates only in subdirectories left in dirnames, in the order in which they're left. Such alterations have no effect if topdown is False (in this case, walk has already visited all subdirectories by the time it visits the current directory and yields its item).

A typical use of os.walk might be to print the paths of all files (not subdirectories) in a tree, skipping those parts of the tree whose root directories' names start with '.':

import os for dirpath, dirnames, filenames in os.walk(tree_root_dir):
    # alter dirnames *in-place* to skip subdirectories named '.something'
    dirnames[:] = [d for d in dirnames if not d.startswith('.')]
    # print the path of each file
    for name in filenames:
        print os.path.join(dirpath, name)

If argument top is a relative path, then the body of a loop on the result of os.walk should not change the working directory, which might cause undefined behavior. os.walk itself never changes the working directory. To transform any name x, an item in dirnames or filenames, into a path, use os.path.join(top, dirpath, x).


10.8.4. The os.path Module

The os.path module supplies functions to analyze and transform path strings. To use this module, you can import os.path; however, if you just import os, you can also access module os.path and all of its attributes.

abspath

abspath(path)

Returns a normalized absolute path string equivalent to path, just like:

os.path.normpath(os.path.join(os.getcwd( ), path))

For example, os.path.abspath(os.curdir) is the same as os.getcwd( ).

basename

basename(path)

Returns the base name part of path, just like os.path.split(path)[1]. For example, os.path.basename('b/c/d.e') returns 'd.e'.

commonprefix

commonprefix(list)

Accepts a list of strings and returns the longest string that is a prefix of all items in the list. Unlike all other functions in os.path, commonprefix works on arbitrary strings, not just on paths. For example, os.path.commonprefix(['foobar', 'foolish']) returns 'foo'.

dirname

dirname(path)

Returns the directory part of path, just like os.path.split(path)[0]. For example, os.path.dirname('b/c/d.e') returns 'b/c'.

exists

exists(path)

Returns TRue when path names an existing file or directory; otherwise, False. In other words, os.path.exists(x) is the same as os.access(x, os.F_OK).

expandvars

expandvars(path)

Returns a copy of string path, where each substring of the form $name or ${name} is replaced with the value of environment variable name. For example, if environment variable HOME is set to /u/alex, the following code:

import os print os.path.expandvars('$HOME/foo/')

emits /u/alex/foo/.

getatime, getmtime, getsize

getatime(path) getmtime(path) getsize(path)

Each of these functions returns an attribute from the result of os.stat(path): respectively, st_atime, st_mtime, and st_size. See stat on page 244 for more details about these attributes.

isabs

isabs(path)

Returns true when path is absolute. A path is absolute when it starts with a slash (/), or, on some non-Unix-like platforms, with a drive designator followed by os.sep. When path is not absolute, isabs returns False.

isfile

isfile(path)

Returns true when path names an existing regular file (in Unix, however, isfile also follows symbolic links); otherwise, False.

isdir

isdir(path)

Returns TRue when path names an existing directory (in Unix, however, isdir also follows symbolic links); otherwise, False.

islink

islink(path)

Returns true when path names a symbolic link; otherwise (always on platforms that don't support symbolic links), islink returns False.

ismount

ismount(path)

Returns true when path names a mount point; otherwise (always on platforms that don't support mount points), ismount returns False.

join

join(path, *paths)

Returns a string that joins the argument strings with the appropriate path separator for the current platform. For example, on Unix, exactly one slash character / separates adjacent path components. If any argument is an absolute path, join ignores all previous components. For example:

print os.path.join('a/b', 'c/d', 'e/f')
# on Unix prints: a/b/c/d/e/f print os.path.join('a/b', '/c/d', 'e/f')
# on Unix prints: /c/d/e/f

The second call to os.path.join ignores its first argument 'a/b', since its second argument '/c/d' is an absolute path.

normcase

normcase(path)

Returns a copy of path with case normalized for the current platform. On case-sensitive filesystems (as is typical in Unix-like systems), path is returned unchanged. On case-insensitive filesystems (as typical in Windows), all letters in the returned string are lowercase. On Windows, normcase also converts each / to a \.

normpath

normpath(path)

Returns a normalized pathname equivalent to path, removing redundant separators and path-navigation aspects. For example, on Unix, normpath returns 'a/b' when path is any of 'a//b', 'a/./b', or 'a/c/../b'. normpath makes path separators appropriate for the current platform. For example, on Windows, separators become \.

split

split(path)

Returns a pair of strings (dir, base) such that join(dir, base) equals path. base is the last pathname component and never contains a path separator. If path ends in a separator, base is ''. dir is the leading part of path, up to the last path separator, shorn of trailing separators. For example, os.path.split('a/b/c/d') returns the pair ('a/b/c', 'd').

splitdrive

splitdrive(path)

Returns a pair of strings (drv, pth) such that drv+pth equals path. drv is either a drive specification or ''. drv is always '' on platforms that do not support drive specifications, such as all Unix-like systems. For example, on Windows, os.path.splitdrive('c:d/e') returns the pair ('c:', 'd/e').

splitext

splitext(path)

Returns a pair of strings (root, ext) such that root+ext equals path. ext is either '' or starts with a '.' and has no other '.' or path separator. For example, os.path.splitext('a.a/b.c.d') returns the pair ('a.a/b.c', '.d').

walk

walk(path, func, arg)

Calls func(arg, dirpath, namelist) for each directory in the tree whose root is directory path, starting with path itself. This function is complicated to use and obsolete; use, instead, generator os.walk, covered in walk on page 245.


10.8.5. The stat Module

Function os.stat (covered in stat on page 244) returns instances of stat_result, whose item indices, attribute names, and meaning are covered in Table 10-2. The stat module supplies attributes with names like those of stat_result's attributes, turned into uppercase, and corresponding values that are the corresponding item indices.

More interesting contents of module stat are functions that examine the st_mode attribute of a stat_result instance to determine the kind of file. os.path also supplies functions for such tasks, which operate directly on the file's path. The functions supplied by stat are faster when they perform several tests on the same file: they require only one os.stat call at the start of a series of tests, while the functions in os.path implicitly ask the operating system for the information at each test. Each function returns true if mode denotes a file of the given kind; otherwise, False.


S_ISDIR( mode)

Is the file a directory?


S_ISCHR( mode)

Is the file a special device-file of the character kind?


S_ISBLK( mode)

Is the file a special device-file of the block kind?


S_ISREG( mode)

Is the file a normal file (not a directory, special device-file, and so on)?


S_ISFIFO( mode)

Is the file a FIFO (i.e., a "named pipe")?


S_ISLNK( mode)

Is the file a symbolic link?


S_ISSOCK( mode)

Is the file a Unix-domain socket?

Except for stat.S_ISDIR and stat.S_ISREG, the other functions are meaningful only on Unix-like systems, since other platforms do not keep special files such as devices and sockets in the same filesystem as regular files, and don't provide symbolic links as directly as Unix-like systems do.

Module stat supplies two more functions that extract relevant parts of a file's mode (x.st_mode, for some result x of function os.stat).

S_IFMT

S_IFMT(mode)

Returns those bits of mode that describe the kind of file (i.e., those bits that are examined by functions S_ISDIR, S_ISREG, etc.).

S_IMODE

S_IMODE(mode)

Returns those bits of mode that can be set by function os.chmod (i.e., the permission bits and, on Unix-like platforms, other special bits such as the set-user-id flag).


10.8.6. The filecmp Module

The filecmp module supplies functionality to compare files and directories.

cmp

cmp(f1, f2, shallow=true)

Compares the files named by path strings f1 and f2. If the files seem equal, cmp returns true; otherwise, False. If shallow is true, files are "equal" if their stat tuples are. If shallow is false, cmp reads and compares files whose stat tuples are equal.

cmpfiles

cmpfiles(dir1, dir2, common, shallow=TRue)

Loops on sequence common. Each item of common is a string that names a file present in both directories dir1 and dir2. cmpfiles returns a tuple whose items are three lists of strings: (equal, diff, errs). equal is the list of names of files that are equal in both directories, diff is the list of names of files that differ between directories, and errs is the list of names of files that could not be compared (because they do not exist in both directories, or there is no permission to read them). Argument shallow is the same as for function cmp.

dircmp

class dircmp(dir1, dir2, ignore=('RCS', 'CVS', 'tags'), hide=('.', '..'))

Creates a new directory-comparison instance object, comparing directories named dir1 and dir2, ignoring names listed in ignore, and hiding names listed in hide. A dircmp instance d exposes three methods:


d.report( )

Outputs to sys.stdout a comparison between dir1 and dir2


d.report_partial_closure( )

Outputs to sys.stdout a comparison between dir1 and dir2 and their common immediate subdirectories


d.report_full_closure( )

Outputs to sys.stdout a comparison between dir1 and dir2 and all their common subdirectories, recursively

A dircmp instance d supplies several attributes, computed just in time (i.e., only if and when needed, thanks to a _ _getattr_ _ special method) so that using a dircmp instance suffers no unnecessary overhead. d's attributes are:


d.common

Files and subdirectories that are in both dir1 and dir2


d.common_dirs

Subdirectories that are in both dir1 and dir2


d.common_files

Files that are in both dir1 and dir2


d.common_funny

Names that are in both dir1 and dir2 for which os.stat reports an error or returns different kinds for the versions in the two directories


d.diff_files

Files that are in both dir1 and dir2 but with different contents


d.funny_files

Files that are in both dir1 and dir2 but could not be compared


d.left_list

Files and subdirectories that are in dir1


d.left_only

Files and subdirectories that are in dir1 and not in dir2


d.right_list

Files and subdirectories that are in dir2


d.right_only

Files and subdirectories that are in dir2 and not in dir1


d.same_files

Files that are in both dir1 and dir2 with the same contents


d.subdirs

A dictionary whose keys are the strings in common_dirs; the corresponding values are instances of dircmp for each subdirectory


10.8.7. The shutil Module

The shutil module (an abbreviation for shell utilities) supplies functions to copy and move files, and to remove an entire directory tree. In addition to offering functions that are directly useful, the source file shutil.py in the standard Python library is an excellent example of how to use many os functions.

copy

copy(src, dst)

Copies the contents of file src, creating or overwriting file dst. If dst is a directory, the target is a file with the same base name as src in directory dst. copy also copies permission bits, except last-access and modification times.

copy2

copy2(src, dst)

Like copy, but also copies times of last access and modification.

copyfile

copyfile(src, dst)

Copies just the contents (not permission bits, nor last-access and modification times) of file src, creating or overwriting file dst.

copyfileobj

copyfileobj(fsrc, fdst, bufsize=16384)

Copies all bytes from file object fsrc, which must be open for reading, to file object fdst, which must be open for writing. Copies no more than bufsize bytes at a time if bufsize is greater than 0. File objects are covered in "File Objects" on page 216.

copymode

copymode(src, dst)

Copies permission bits of file or directory src to file or directory dst. Both src and dst must exist. Does not change dst's contents, nor its file or directory status.

copystat

copystat(src, dst)

Copies permission bits and times of last access and modification of file or directory src to file or directory dst. Both src and dst must exist. Does not change dst's contents, nor its file or directory status.

copytree

copytree(src, dst, symlinks=False)

Copies the directory tree rooted at src into the destination directory named by dst. dst must not already exist: copytree creates it. copytree copies each file by using function copy2. When symlinks is true, copytree creates symbolic links in the new tree when it finds symbolic links in the source tree. When symlinks is false, copytree follows each symbolic link it finds and copies the linked-to file with the link's name. On platforms that do not have the concept of a symbolic link, such as Windows, copytree ignores argument symlinks.

move

move(src, dst)

Moves file or directory src to dst. First tries os.rename. Then, if that fails (because src and dst are on separate filesystems, or because they're files and dst already exists), copies src to dst (copy2 for a file, copytree for a directory), then removes src (os.unlink for a file, rmtree for a directory).

rmtree

rmtree(path, ignore_errors=False, onerror=None)

Removes the directory tree rooted at path. When ignore_errors is true, rmtree ignores errors. When ignore_errors is false and onerror is None, any error raises an exception. When onerror is not None, it must be callable with three parameters: func, path, and excp. func is the function that raises an exception (os.remove or os.rmdir), path is the path passed to func, and excp is the tuple of information that sys.exc_info( ) returns. If onerror raises any exception x, rmtree terminates, and exception x propagates.


10.8.8. File Descriptor Operations

The os module supplies functions to handle file descriptors, which are integers that the operating system uses as opaque handles to refer to open files. Python file objects (covered in "File Objects" on page 216) are almost invariably better for input/output (I/O) tasks, but sometimes working at file-descriptor level lets you perform some operation more rapidly or elegantly. Note that file objects and file descriptors are not interchangeable in any way.

You can get the file descriptor n of a Python file object f by calling n=f.fileno( ). You can wrap a new Python file object f around an open file descriptor fd by calling f=os.fdopen(fd). On Unix-like and Windows platforms, some file descriptors are preallocated when a process starts: 0 is the file descriptor for the process's standard input, 1 for the process's standard output, and 2 for the process's standard error.

os provides the following functions for working with file descriptors.

close

close(fd)

Closes file descriptor fd.

dup

dup(fd)

Returns a file descriptor that duplicates file descriptor fd.

dup2

dup2(fd, fd2)

Duplicates file descriptor fd to file descriptor fd2. If file descriptor fd2 is already open, dup2 first closes fd2.

fdopen

fdopen(fd, mode='r', bufsize=-1)

Returns a Python file object wrapping file descriptor fd. mode and bufsize have the same meaning as for Python's built-in open, covered in "Creating a File Object with open" on page 216.

fstat

fstat(fd)

Returns a stat_result instance x, with information about the file that is open on file descriptor fd. Table 10-2 covers x's contents.

lseek

lseek(fd, pos, how)

Sets the current position of file descriptor fd to the signed integer byte offset pos and returns the resulting byte offset from the start of the file. how indicates the reference (point 0). When how is 0, the reference is the start of the file; when 1, the current position; when 2, the end of the file. In particular, lseek(fd, 0, 1) returns the current position's byte offset from the start of the file without affecting the current position. Normal disk files support seeking; calling lseek on a file that does not support seeking (e.g., a file open for output to a terminal) raises an exception. In Python 2.5, module os has attributes named SEEK_SET, SEEK_CUR, and SEEK_END, with values of 0, 1, and 2, respectively, to use instead of the bare integer constants for readability.

open

open(file, flags, mode=0777)

Returns a file descriptor, opening or creating a file named by string file. If open creates the file, it uses mode as the file's permission bits. flags is an int, and is normally obtained by bitwise ORing one or more of the following attributes of os:


O_RDONLY O_WRONLY O_RDWR

Opens file for read-only, write-only, or read/write, respectively (mutually exclusive: exactly one of these attributes must be in flags)


O_NDELAY O_NONBLOCK

Opens file in nonblocking (no-delay) mode if the platform supports this


O_APPEND

Appends any new data to file's previous contents


O_DSYNC O_RSYNC O_SYNC O_NOCTTY

Sets synchronization mode accordingly if the platform supports this


O_CREAT

Creates file if file does not already exist


O_EXCL

Raises an exception if file already exists


O_TRUNC

Throws away previous contents of file (incompatible with O_RDONLY)


O_BINARY

Opens file in binary rather than text mode on non-Unix platforms (innocuous and without effect on Unix-like platforms)

pipe

pipe( )

Creates a pipe and returns a pair of file descriptors (r, w), respectively open for reading and writing.

read

read(fd, n)

Reads up to n bytes from file descriptor fd and returns them as a plain string. Reads and returns m<n bytes when only m more bytes are currently available for reading from the file. In particular, returns the empty string when no more bytes are currently available from the file, typically because the file is finished.

write

write(fd, s)

Writes all bytes from plain string s to file descriptor fd and returns the number of bytes written (i.e., len(s)).



Previous Page
Next Page