10.3. File Objects
As mentioned in "Organization of This Chapter" on page 215, file is a built-in type in Python and the single most common way for your Python programs to read or write data. With a file object, you can read and/or write data to a file as seen by the underlying operating system. Python reacts to any I/O error related to a file object by raising an instance of built-in exception class IOError. Errors that cause this exception include open failing to open or create a file, calls to a method on a file object to which that method doesn't apply (e.g., calling write on a read-only file object, or calling seek on a nonseekable file), and I/O errors diagnosed by a file object's methods. This section covers file objects, as well as the important issue of making temporary files.
10.3.1. Creating a File Object with open
To create a Python file object, call the built-in open with the following syntax:
open(filename, mode='r', bufsize=-1)
open opens the file named by plain string filename, which denotes any path to a file. open returns a Python file object f, which is an instance of the built-in type file. Currently, calling file directly is like calling open, but you should call open, which may become a factory function in some future release of Python. If you explicitly pass a mode string, open can also create filename if the file does not already exist (depending on the value of mode, as we'll discuss in a moment). In other words, despite its name, open is not just for opening existing files: it can also create new ones.
10.3.1.1. File mode
mode is a string that indicates how the file is to be opened (or created). mode can be:
10.3.1.2. Binary and text modes
The mode string may also have any of the values just explained followed by a b or t. b denotes binary mode, while t denotes text mode. When the mode string has neither b nor t, the default is text mode (i.e., 'r' is like 'rt', 'w' is like 'wt', and so on).
On Unix, there is no difference between binary and text modes. On Windows, when a file is open in text mode, '\n' is returned each time the string that is the value of os.linesep (the line termination string) is encountered while the file is being read. Conversely, a copy of os.linesep is written each time you write '\n' to the file.
This widespread convention, originally developed in the C language, lets you read and write text files on any platform without worrying about the platform's line-separation conventions. However, except on Unix-like platforms, you do have to know (and tell Python, by passing the proper mode argument to open) whether a file is binary or text. In this chapter, for simplicity, I use \n to refer to the line-termination string, but remember that the string is in fact os.linesep in files on the filesystem, translated to and from \n in memory only for files opened in text mode.
Python also supports universal newlines, which let you open a text file for reading in mode 'U' (or, equivalently, 'rU') when you don't know how line separators are encoded in the file. This is useful, for example, when you share text files across a network between machines with different operating systems. Mode 'U' takes any of '\n', '\r', and '\r\n' as a line separator, and translates any line separator to '\n'.
bufsize is an integer that denotes the buffer size you're requesting for the file. When bufsize is less than 0, the operating system's default is used. Normally, this default is line buffering for files that correspond to interactive consoles and some reasonably sized buffer, such as 8,192 bytes, for other files. When bufsize equals 0, the file is unbuffered; the effect is as if the file's buffer were flushed every time you write anything to the file. When bufsize equals 1, the file is line-buffered, which means the file's buffer is flushed every time you write \n to the file. When bufsize is greater than 1, the file uses a buffer of about bufsize bytes, rounded up to some reasonable amount. On some platforms, you can change the buffering for files that are already open, but there is no cross-platform way to do this.
10.3.1.4. Sequential and nonsequential access
A file object f is inherently sequential (i.e., a stream of bytes). When you read from a file, you get bytes in the sequential order in which they're present in the file. When you write to a file, the bytes you write are put in the file in the order in which you write them.
To allow nonsequential access, each built-in file object keeps track of its current position (the position on the underlying file where the next read or write operation will start transferring data). When you open a file, the initial position is at the start of the file. Any call to f.write on a file object f opened with a mode of 'a' or 'a+' always sets f's position to the end of the file before writing data to f. When you read or write n bytes on file object f, f's position advances by n. You can query the current position by calling f.tell and change the position by calling f.seek, which are both covered in the next section.
10.3.2. Attributes and Methods of File Objects
A file object f supplies the attributes and methods documented in this section.
10.3.3. Iteration on File Objects
A file object f, open for text-mode reading, is also an iterator whose items are the file's lines. Thus, the loop:
for line in f:
iterates on each line of the file. Due to buffering issues, interrupting such a loop prematurely (e.g., with break), or calling f.next( ) instead of f.readline( ), leaves the file's position set to an arbitrary value. If you want to switch from using f as an iterator to calling other reading methods on f, be sure to set the file's position to a known value by appropriately calling f.seek. On the plus side, a loop directly on f has very good performance, since these specifications allow the loop to use internal buffering to minimize I/O without taking up excessive amounts of memory even for huge files.
10.3.4. File-Like Objects and Polymorphism
An object x is file-like when it behaves polymorphically to a file, meaning that a function (or some other part of a program) can use x as if x were a file. Code using such an object (known as the client code of the object) typically receives the object as an argument or gets it by calling a factory function that returns the object as the result. For example, if the only method that client code calls on x is x.read( ), without arguments, then all x needs to supply in order to be file-like for that code is a method read that is callable without arguments and returns a string. Other client code may need x to implement a larger subset of file methods. File-like objects and polymorphism are not absolute concepts: they are relative to demands placed on an object by some specific client code.
Polymorphism is a powerful aspect of object-oriented programming, and file-like objects are a good example of polymorphism. A client-code module that writes to or reads from files can automatically be reused for data residing elsewhere, as long as the module does not break polymorphism by the dubious practice of type testing. When we discussed the built-ins type and isinstance in type on page 157 and isinstance on page 163, I mentioned that type testing is often best avoided, since it blocks the normal polymorphism that Python otherwise supplies. Sometimes, you may have no choice. For example, the marshal module (covered in "The marshal Module" on page 278) demands real file objects. Therefore, when your client code needs to use marshal, your code must deal with real file objects, not just file-like ones. However, such situations are rare. Most often, to support polymorphism in your client code, all you have to do is avoid type testing.
You can implement a file-like object by coding your own class (as covered in Chapter 5) and defining the specific methods needed by client code, such as read. A file-like object fl need not implement all the attributes and methods of a true file object f. If you can determine which methods client code calls on fl, you can choose to implement only that subset. For example, when fl is only going to be written, fl doesn't need "reading" methods, such as read, readline, and readlines.
When you implement a writable file-like object fl, make sure that fl.softspace can be read and written, and don't alter nor interpret softspace in any way, if you want fl to be usable by print (covered in "The print Statement" on page 256). Note that this behavior is the default when you write fl's class in Python. You need to take specific care only when fl's class overrides special method _ _setattr_ _, or otherwise controls access to its instances' attributes (e.g., by defining _ _slots_ _), as covered in Chapter 5. In particular, if your new-style class defines _ _slots_ _, then one of the slots must be named softspace if you want instances of your class to be usable as destinations of print statements.
If the main reason you want a file-like object instead of a real file object is to keep the data in memory, use modules StringIO and cStringIO, covered in "The StringIO and cStringIO Modules" on page 229. These modules supply file-like objects that hold data in memory and behave polymorphically to file objects to a wide extent.
10.3.5. The tempfile Module
The tempfile module lets you create temporary files and directories in the most secure manner afforded by your platform. Temporary files are often an excellent solution when you're dealing with an amount of data that might not comfortably fit in memory, or when your program needs to write data that another process will then use.
The order of the parameters for the functions in this module is a bit confusing: to make your code more readable, always call these functions with named-argument syntax. Module tempfile exposes the following functions.