Previous Page
Next Page

11.4. The Python Database API (DBAPI) 2.0

As I mentioned earlier, the Python standard library does not come with an RDBMS interface, but there are many free third-party modules that let your Python programs access specific databases. Such modules mostly follow the Python Database API 2.0 standard, also known as the DBAPI. A new version of the DBAPI (possibly to be known as 3.0) is likely to appear in the future, but currently there are no firm plans or schedules for one. Programs written against DBAPI 2.0 should work with minimal or no changes with any future DBAPI 3.0, although 3.0, if and when it comes, will no doubt offer further enhancements that future programs will be able to take advantage of.

If your Python program runs only on Windows, you might prefer to access databases by using Microsoft's ADO package through COM. For more information on using Python on Windows, see Python Programming on Win32, by Mark Hammond and Andy Robinson (O'Reilly). Since ADO and COM are platform-specific, and this book focuses on cross-platform use of Python, I do not cover ADO and COM further in this book. However, at http://adodbapi.sourceforge.net/ you will find a useful Python extension that lets you access ADO indirectly through DBAPI.

After importing a DBAPI-compliant module, call the module's connect function with suitable parameters. connect returns an instance of Connection, which represents a connection to the database. The instance supplies commit and rollback methods to deal with transactions, a close method to call as soon as you're done with the database, and a cursor method to return an instance of Cursor. The cursor supplies the methods and attributes used for database operations. A DBAPI-compliant module also supplies exception classes, descriptive attributes, factory functions, and type-description attributes.

11.4.1. Exception Classes

A DBAPI-compliant module supplies exception classes Warning, Error, and several subclasses of Error. Warning indicates such anomalies as data truncation during insertion. Error's subclasses indicate various kinds of errors that your program can encounter when dealing with the database and the DBAPI-compliant module that interfaces to it. Generally, your code uses a statement of the form:

try:
   ...
except module.Error, err:
   ...

to trap all database-related errors that you need to handle without terminating.

11.4.2. Thread Safety

When a DBAPI-compliant module has an attribute threadsafety greater than 0, the module is asserting some level of thread safety for database interfacing. Rather than relying on this, it's safer and more portable to ensure that a single thread has exclusive access to any given external resource, such as a database, as outlined in "Threaded Program Architecture" on page 350.

11.4.3. Parameter Style

A DBAPI-compliant module has an attribute paramstyle to identify the style of markers used as placeholders for parameters. Insert such markers in SQL statement strings that you pass to methods of Cursor instances, such as method execute, to use runtime-determined parameter values. Say, for example, that you need to fetch the rows of database table ATABLE where field AFIELD equals the current value of Python variable x. Assuming the cursor instance is named c, you could perform this task by using Python's string-formatting operator %, as follows:

c.execute('SELECT * FROM ATABLE WHERE AFIELD=%r' %
x)

However, this is not the recommended approach. This approach generates a different statement string for each value of x, requiring such statements to be parsed and prepared anew each time. With parameter substitution, you pass to execute a single statement string, with a placeholder instead of the parameter value. This lets execute perform parsing and preparation just once, giving potentially better performance. For example, if a module's paramstyle attribute is 'qmark', you can express the above query as:

c.execute('SELECT * FROM ATABLE WHERE AFIELD=?',
[x])

The read-only string attribute paramstyle tells your program how it should use parameter substitution with that module. The possible values of paramstyle are:


format

The marker is %s, as in string formatting. A query looks like:

c.execute('SELECT * FROM ATABLE WHERE AFIELD=%s',
[x])


named

The marker is :name, and parameters are named. A query looks like:

c.execute('SELECT * FROM ATABLE WHERE AFIELD=:x',
{'x':x})


numeric

The marker is :n, giving the parameter's number. A query looks like:

c.execute('SELECT * FROM ATABLE WHERE AFIELD=:1',
[x])


pyformat

The marker is %(name)s, and parameters are named. A query looks like:

c.execute('SELECT * FROM ATABLE WHERE AFIELD=%(x)s',
{'x':x})


qmark

The marker is ?. A query looks like:

c.execute('SELECT * FROM ATABLE WHERE AFIELD=?', [x])

When paramstyle is neither 'pyformat' nor 'named', the second argument of method execute is a sequence. When parameters are named (i.e., paramstyle is 'pyformat' or 'named'), the second argument of method execute is a dictionary.

11.4.4. Factory Functions

Parameters passed to the database via placeholders must typically be of the right type: this means Python numbers (integers or floating-point values), strings (plain or Unicode), and None to represent SQL NULL. There is no type universally used to represent dates, times, and binary large objects (BLOBs). A DBAPI-compliant module supplies factory functions to build such objects. The types used for this purpose by most DBAPI-compliant modules are those supplied by modules datetime and mxDateTime (covered in Chapter 12), and strings or buffer types for BLOBs. The factory functions specified by the DBAPI are as follows.

Binary

Binary(string)

Returns an object representing the given string of bytes as a BLOB.

Date

Date(year,month,day)

Returns an object representing the specified date.

DateFromTicks

DateFromTicks(s)

Returns an object representing the date s seconds after the epoch of module time, covered in Chapter 12. For example, DateFromTicks(time.time( )) is "today."

Time

Time(hour,minute,second)

Returns an object representing the specified time.

TimeFromTicks

TimeFromTicks(s)

Returns an object representing the time s seconds after the epoch of module time, covered in Chapter 12. For example, TimeFromTicks(time.time( )) is "now."

Timestamp

Timestamp(year,month,day,hour,minute,second)

Returns an object representing the specified date and time.

Timestamp-FromTicks

TimestampFromTicks(s)

Returns an object representing the date and time s seconds after the epoch of module time, covered in Chapter 12. For example, TimestampFromTicks(time.time( )) is the current date and time.


11.4.5. Type Description Attributes

A Cursor instance's attribute description describes the types and other characteristics of each column of a query. Each column's type (the second item of the tuple describing the column) equals one of the following attributes of the DBAPI-compliant module:


BINARY

Describes columns containing BLOBs


DATETIME

Describes columns containing dates, times, or both


NUMBER

Describes columns containing numbers of any kind


ROWID

Describes columns containing a row-identification number


STRING

Describes columns containing text of any kind

A cursor's description, and in particular each column's type, is mostly useful for introspection about the database your program is working with. Such introspection can help you write general modules and work with tables using different schemas, including schemas that may not be known at the time you are writing your code.

11.4.6. The connect Function

A DBAPI-compliant module's connect function accepts arguments that vary depending on the kind of database and the specific module involved. The DBAPI standard recommends, but does not mandate, that connect accept named arguments. In particular, connect should at least accept optional arguments with the following names:


database

Name of the specific database to connect


dsn

Data-source name to use for the connection


host

Hostname on which the database is running


password

Password to use for the connection


user

Username for the connection

11.4.7. Connection Objects

A DBAPI-compliant module's connect function returns an object x that is an instance of class Connection. x supplies the following methods.

close

x.close( )

Terminates the database connection and releases all related resources. Call close as soon as you're done with the database. Keeping database connections needlessly open can be a serious resource drain on the system.

commit

x.commit( )

Commits the current transaction in the database. If the database does not support transactions, x.commit( ) is an innocuous no-op.

cursor

x.cursor( )

Returns a new instance of class Cursor, covered in "Cursor Objects" on page 297.

rollback

x.rollback( )

Rolls back the current transaction in the database. If the database does not support transactions, x.rollback( ) raises an exception. The DBAPI recommends, but does not mandate, that for databases that do not support transactions class Connection supplies no rollback method, so that x.rollback( ) raises AttributeError. You can test whether transaction support is present with hasattr(x,'rollback').


11.4.8. Cursor Objects

A Connection instance provides a cursor method that returns an object c that is an instance of class Cursor. A SQL cursor represents the set of results of a query and lets you work with the records in that set, in sequence, one at a time. A cursor as modeled by the DBAPI is a richer concept, since it represents the only way in which your program executes SQL queries in the first place. On the other hand, a DBAPI cursor allows you only to advance in the sequence of results (some relational databases, but not all, also provide richer cursors that are able to go backward as well as forward), and does not support the SQL clause WHERE CURRENT OF CURSOR. These limitations of DBAPI cursors enable DBAPI-compliant modules to provide cursors even on RDBMSes that provide no real SQL cursors at all. An instance of class Cursor c supplies many attributes and methods; the most frequently used ones are documented here.

close

c.close( )

Closes the cursor and releases all related resources.

description

A read-only attribute that is a sequence of seven-item tuples, one per column in the last query executed:

name, typecode, displaysize, internalsize, precision, scale, nullable

c.description is None if the last operation on c was not a query or returned no usable description of the columns involved. A cursor's description is mostly useful for introspection about the database your program is working with. Such introspection can help you write general modules that are able to work with tables using different schemas, including schemas that may not be fully known at the time you are writing your code.

execute

c.execute(statement,parameters=None)

Executes a SQL statement string on the database with the given parameters. parameters is a sequence when the module's paramstyle is 'format', 'numeric', or 'qmark', and a dictionary when it's 'named' or 'pyformat'.

executemany

c.executemany(statement,*parameters)

Executes a SQL statement on the database, once for each item of the given parameters. parameters is a sequence of sequences when the module's paramstyle is 'format', 'numeric', or 'qmark', and a sequence of dictionaries when it's 'named' or 'pyformat'. For example, the statement:

c.executemany('UPDATE atable SET x=? WHERE y=?',(12,23),(23,34))

when paramstyle is 'qmark' is equivalent to, but faster than, the two statements:

c.execute('UPDATE atable SET x=12 WHERE y=23')
c.execute('UPDATE atable SET x=23 WHERE y=34')

fetchall

c.fetchall( )

Returns all remaining result rows from the last query as a sequence of tuples. Raises an exception if the last operation was not a SELECT query.

fetchmany

c.fetchmany(n)

Returns up to n remaining result rows from the last query as a sequence of tuples. Raises an exception if the last operation was not a SELECT query.

fetchone

c.fetchone( )

Returns the next result row from the last query as a tuple. Raises an exception if the last operation was not a SELECT query.

rowcount

A read-only attribute that specifies the number of rows fetched or affected by the last operation, or -1 if the module is unable to determine this value.


11.4.9. DBAPI-Compliant Modules

Whatever relational database you want to use, there's at least one (often more than one) Python DBAPI-compliant module downloadable from the Internet. All modules listed in the following sections, except mxODBC (and SAPDB, which uses GPL) have liberal licenses that are similar to Python's: you can use them freely in either open source or closed source programs. mxODBC can be used freely for noncommercial purposes, but you must purchase a license for commercial use. There are so many relational databases that I can't list them all, but here are some of the most popular ones:


ODBC

Open DataBase Connectivity (ODBC) is a standard way to connect to many different databases, including some not supported by other DBAPI-compliant modules, such as Microsoft Jet (also known as the Access database). The Python Windows distribution contains an odbc module, but the module is unsupported and follows an older version of the DBAPI, not the current version 2.0. On Unix or Windows, use mxODBC (http://www.lemburg.com/files/Python/mxODBC.html). mxODBC's paramstyle is 'qmark'. Its connect function accepts three optional arguments: dsn, user, and password.


Oracle

Oracle is a widespread commercial RDBMS. To interface to Oracle, you can use DCOracle2, available at http://www.zope.org/Members/matt/dco2. DCOracle2's paramstyle is 'numeric'. Its connect function accepts a single optional, unnamed argument string with syntax:

'user/password@service'

cx_oracle (http://www.python.net/crew/atuining/cx_Oracle/index.html) is an alternative. paramstyle is 'named'; the connect function accepts a string in the same format as DCOracle2's, or many optional parameters named dsn, user, passwd, and more besides.


Microsoft SQL Server

To interface to Microsoft SQL Server, I recommend module mssqldb, available at http://www.object-craft.com.au/projects/mssql/. mssqldb's paramstyle is 'qmark'. Its connect function accepts three argumentsnamed dsn, user, and passwdas well as an optional database argument. pymssql (http://pymssql.sourceforge.net/) is an alternative.


DB/2

For IBM DB/2, try module DB2, available at http://sourceforge.net/projects/pydb2. DB2's paramstyle is 'format'. Its connect function accepts three optional arguments: named dsn, uid, and pwd.


MySQL

MySQL is a widespread, open source RDBMS. To interface to MySQL, try MySQLdb, available at http://sourceforge.net/projects/mysql-python. MySQLdb's paramstyle is 'format'. Its connect function accepts four optional arguments: named db, host, user, and passwd.


PostgreSQL

PostgreSQL is an excellent open source RDBMS. To interface to PostgreSQL, I recommend psycopg, available at http://initd.org/Software/psycopg. psycopg's paramstyle is 'pyformat'. Its connect function accepts a single mandatory string argument, named dsn, with the syntax:

'host=host dbname=dbname
user=username password=password'


SAP DB

SAP DB, once known as Adabas, is a powerful RDBMS that used to be closed source but is now open source. SAP DB comes with sapdbapi (available at http://www.sapdb.org/sapdbapi.html) as well as other useful Python modules. sapdbapi's paramstyle is 'pyformat'. Its connect function accepts three mandatory argumentsnamed user, password, and databaseand an optional argument named host.

11.4.10. Gadfly

Gadfly (http://gadfly.sf.net) is not an interface to some other RDBMS but rather a complete RDBMS engine written in Python. Gadfly supports a large subset of standard SQL. For example, Gadfly lacks NULL, but it does support VIEW. Gadfly can run as a daemon server, to which clients connect with TCP/IP. Alternatively, you can run the Gadfly engine directly in your application's process if you don't need other processes to be able to access the same database concurrently.

The gadfly module has several discrepancies from the DBAPI 2.0 (covered in "The Python Database API (DBAPI) 2.0" on page 292) because Gadfly implements a variant of the older DBAPI 1.0. The concepts are quite close, but several details differ. The main differences are:

  • gadfly does not supply custom exception classes, so Gadfly operations that fail raise normal Python exceptions, such as IOError, NameError, etc.

  • gadfly does not supply a paramstyle attribute. However, the module behaves as if it supplied a paramstyle of 'qmark'.

  • gadfly does not supply a function named connect; use the gadfly.gadfly or gadfly.client.gfclient functions instead.

  • gadfly does not supply factory functions for data types.

  • Gadfly cursors do not supply the executemany method. Instead, in the specific case in which the SQL statement is an INSERT, the execute method optionally accepts as its second argument a list of tuples and inserts all the data.

  • Gadfly cursors do not supply the rowcount method.

The gadfly module supplies the following functions.

gadfly

gadfly.gadfly(dbname,dirpath)

Returns a connection object for the database named dbname, which must have been previously created in the directory indicated by string dirpath. The database engine runs in the same process as your application.

gfclient

gadfly.client.gfclient(policyname, port, password, host)

Returns a connection object for the database served by a gfserve process on the given host and port. policyname identifies the level of access required, and is often 'admin' to specify unlimited access.


11.4.11. SQLite

SQLite (http://www.sqlite.org) is similar to Gadfly in that it's not an interface to some other RDBMS but rather a complete, self-contained RDBMS engine. However, SQLite is written in C and may offer better performance, and accessibility from multiple programming languages, compared to Gadfly. The most popular Python interface to SQLite is PySQLite, available at http://initd.org/tracker/pysqlite. It's quite compatible with DBAPI 2.0, except that it doesn't support types (all data is actually held by SQLite as strings). PySQLite's paramstyle is 'qmark'. An alternative is APSW, available at http://www.rogerbinns.com/apsw.html. APSW does not even try to meet the DBAPI 2.0's specs but rather faithfully implements SQLite's own preferred API.

In Python 2.5, PySQLite is included in the Python Standard Library as package sqlite3.


Previous Page
Next Page