Choosing an API

This section provides general guidelines to help you choose an API for various types of applications. It compares the capabilities of the C, DBI, and PHP APIs to give you some idea of their relative strengths and weaknesses, and to indicate when you might choose one over another.

I am not advocating any one of these languages over the others, although I do have my preferences. You will have your own preferences, too, as did the technical reviewers for this book. In fact, one reviewer felt that I should emphasize the importance of C for MySQL programming to a much greater extent, whereas another thought I should come down much harder on C programming and discourage its use! The lesson from these varying viewpoints is that you should weigh the factors discussed in this section and come to your own conclusions.

A number of considerations can enter in to your assessment of which API to choose for a particular task:

Intended execution environment. The context in which you expect the application to be used.
Performance. How efficiently applications perform when written in the API language.
Ease of development. How convenient the API and its language make application writing.
Portability. Whether the application will be used for database systems other than MySQL.

The following discussion further examines each of these factors. Be aware that some of the factors interact. For example, you might want an application that performs well, but it can be just as important to use a language that lets you develop the application quickly even if it doesn't perform quite as efficiently.

Execution Environment

When you write an application, you generally have some idea of the environment in which it will be used. For example, it might be a report generator program that you invoke from the shell, or an accounts payable summary program that runs as a cron job at the end of each month. Commands run from the shell or from cron generally stand on their own and require little information from the execution environment. On the other hand, you might be writing an application intended to be invoked by a Web server. Such a program might expect to be able to extract very specific types of information from its execution environment: What browser is the client using? What parameters were entered into a mailing list subscription request form? Did the client supply the correct password for accessing personnel information?

Each API language varies in its suitability for writing applications in these differing environments:

C is a general-purpose language, so in principle you can use it for anything. In practice, C tends to be used more often for standalone programs rather than for Web programming. One reason might be that it's not as easy to perform text processing and memory management in C as it is in Perl or in PHP, and those capabilities tend to be heavily used in Web applications.
Perl, like C, is suitable for writing standalone programs. However, it also happens that Perl is quite useful for Web site developmentfor example, by using the CGI.pm module. This makes Perl a handy language for writing applications that link MySQL with the Web. Such an application can interface to the Web via the CGI.pm module and interface to MySQL using DBI.
PHP is intended by design for writing Web applications, so that's obviously the environment to which it is best suited. Furthermore, database access is one of PHP's biggest strengths, so it's a natural choice for Web applications that perform MySQL-related tasks. It's possible to use PHP as a standalone interpreter (for example, to execute scripts from the shell), but it's not used that way very much.

Given these considerations, C and Perl are the most likely candidate languages if you're writing a standalone application. For Web applications, Perl and PHP are most suitable. If you need to write both types of applications, but don't know any of these languages and want to learn as few as possible, Perl might be your best option.

Performance

All other things being equal, we prefer to have applications run as quickly as possible. However, the actual importance of performance tends to be related to the frequency with which a program is used. For a program that you run once a month as a cron job during the night, performance might not matter that much. On the other hand, if you run a program many times a second on a heavily used Web site, every bit of efficiency you gain can make a big difference. In the latter case, performance plays a significant role in the usefulness and appeal of your site. A slow site is annoying for users, no matter what the site is about, and if you depend on the site as a source of income, decreased performance translates directly into reduced revenue. You cannot service as many connections at a time, and visitors who tire of waiting may just give up and go elsewhere.

Performance assessment is a complex issue. The best indicator of how well your application will perform when written for a particular API is to write it under that API and try it. And the best comparative test is to implement it multiple times under different APIs to see how the versions stack up against each other. Of course, that's not how things usually work. More often, you just want to get your program written. After it's working, you can think about tuning it to see if it can run faster or use less memory, or if it has some other aspect that you can improve. But there are at least two general factors that you can count on as affecting performance in a relatively consistent way:

Compiled programs execute more quickly than interpreted scripts.
For interpreted languages used in a Web context, performance is better when the interpreter is invoked as a module that is part of the Web server itself rather than as a separate process.

Compiled Versus Interpreted Languages

As a general principle, compiled applications are more efficient, use less memory, and execute more quickly than an equivalent version of the program written in a scripting language. This is due to the overhead involved with the language interpreter that executes the scripts. C is compiled and Perl and PHP are interpreted, so C programs generally will run faster than Perl or PHP scripts. Thus, C might be the best choice for a heavily used program.

There are, of course, factors that tend to diminish this clear distinction. For one thing, writing in C generally gives you a faster program, but it's quite possible to write inefficient C programs. Writing a program in a compiled language is no automatic passport to better performance; it's still necessary to think about what you're doing. In addition, the difference between compiled and interpreted programs is lessened if a scripted application spends most of its time executing code in the compiled MySQL client library routines that are linked into the interpreter engine.

Standalone Versus Module Versions of Language Interpreters

For Web-based applications, script language interpreters are usually used in one of two formsat least for Apache, the Web server used in this book for writing Web applications:

You can arrange for Apache to invoke the script interpreter as a separate process. In this mode of operation, when Apache needs to run a Perl or PHP script, it starts the corresponding interpreter and tells it to execute the script. In this case, Apache uses the interpreters as CGI programsthat is, it communicates with them using the Common Gateway Interface (CGI) protocol.
The interpreter can be used as a module that is linked in directly to the Apache binary and that runs as part of the Apache process itself. In Apache terms, the Perl and PHP interpreters take the form of the mod_perl and mod_php modules.

Perl and PHP advocates will argue the speed advantages of their favorite interpreter, but all agree that the form in which the interpreter runs is a much bigger factor than the languages themselves. Either interpreter runs much faster as a module than as a standalone CGI application. With a standalone application, it's necessary to start up the interpreter each time a script is to be executed, so you incur a significant penalty in process-creation overhead. When used as a module within an already running Apache process, an interpreter's capabilities can be accessed from your Web pages instantly. This dramatically improves performance by reducing overhead and translates directly into an increased capacity to handle incoming requests and to dispatch them quickly.

The startup penalty for a standalone interpreter typically results in at least an order of magnitude poorer performance than the module interpreter. Interpreter startup cost is particularly significant when you consider that Web page serving typically involves quick transactions with light processing rather than substantial ones with a lot of processing. If you spend a lot of time just starting up and not very much actually executing the script, you're wasting most of your resources. It's like spending most of the day getting ready for work, arriving at 4 o'clock in the afternoon, and then going home at 5.

You might wonder why there is any benefit with the module versions of the interpretersafter all, you must still start up Apache itself, right? The savings comes from the fact that a given Apache process handles multiple requests. When Apache starts, it immediately spawns a pool of child processes to be used to handle incoming requests. When a request arrives that involves execution of a script, there is already an Apache process ready and waiting to handle it. Also, each instance of Apache services multiple requests, so the process startup cost is incurred only once per set of requests, not once per request. (Apache 2 can use multiple threads rather than separate processes, which reduces the overhead even more.)

One potentially significant difference between Perl and PHP is that Perl has a bigger memory footprint; Apache processes are larger with mod_perl linked in than with mod_php. PHP was designed under the assumptions that it must live cooperatively within another process and that it might be activated and deactivated multiple times within the life of that process. Perl was designed to be run from the command line as a standalone program, not as a language meant to be embedded in a Web server process. This probably contributes to its larger memory footprint; as a module, Perl simply isn't running in its natural environment. Other factors that contribute to the larger footprint are script caching and additional Perl modules that scripts use. In both cases, more code is brought into memory and remains there for the life of the Apache process. (To minimize this problem, there are techniques that allow you to designate only certain Apache processes as enabled for mod_perl. That way, you incur the extra memory overhead only for those processes that execute Perl scripts. The mod_perl area of the Apache Web site has a good discussion of various strategies from which to choose. Visit http://perl.apache.org/docs/ for more information.)

The standalone version of a language interpreter does have one advantage over its module counterpart, in that you can arrange for it to run scripts under a different user ID. The module versions run scripts under the same user ID as the Web server, which is typically an account with minimal privileges for security reasons. That doesn't work very well for scripts that require specific privileges (for example, if you need to be able to read or write protected files). You can combine the module and standalone approaches if you like: Use the module version by default and the standalone version for situations in which scripts need to run with the privileges of a particular user.

What this adds up to is that, whether you choose Perl or PHP, you should try to use it as an Apache module rather than by invoking a separate interpreter process. Reserve use of the standalone interpreter only for those cases that cannot be handled by the module, such as scripts that require special privileges. For these instances, you can process your script by using Apache's suEXEC mechanism to start up the interpreter under a given user ID. (In Apache 2, there is ongoing work designed to allow different groups of scripts to be run under specific user and group IDs, but this feature appears not yet to be ready.)

Development Time

The factors just described affect the performance of your applications, but raw execution efficiency may not be your only goal. Your own time is important, too, as is ease of programming, so another factor to consider in choosing an API for MySQL programming is how quickly you can develop your applications. If you can write a Perl or PHP script in half the time it takes to develop the equivalent C program, you might elect not to use the C API, even if the resulting application doesn't run quite as fast. It's often reasonable to be less concerned about a program's execution time than about the time you spend writing it, particularly for applications that aren't executed frequently. An hour of your time is worth a lot more than an hour of machine time!

Generally, scripting languages allow you to get a program going more quickly, especially for working out a prototype of the finished application. At least two factors contribute to this. First, scripting languages tend to provide higher-level constructs than compiled languages. This allows you to think at a higher level of abstraction, so you can think about what you want to do rather than about the details involved in doing it. For example, PHP associative arrays and Perl hashes are great time savers for maintaining data involving key/value relationships (such as student ID/student name pairs). C has no such construct. If you wanted to implement it in C, you would need to write code to handle many low-level details involving issues such as memory management and string manipulation, and you would need to debug it. This takes time.

Second, the development cycle has fewer steps for scripting languages than for compiled languages. With C, you engage in an edit-compile-test cycle during application development. Every time you modify a program, you must recompile it before testing. With Perl and PHP, the development cycle is simply edit-test because you can run a script immediately after each modification with no compiling. On the other hand, the C compiler enforces more constraints on your program in the form of stricter type checking. The greater discipline imposed by the compiler can help you avoid bugs that you would not catch as easily in looser languages, such as Perl and PHP. If you misspell a variable name in C, the compiler will warn you. PHP and Perl won't do so unless you ask them to. These tighter constraints can be especially valuable as your applications become larger and more difficult to maintain.

In general, the tradeoff is the usual one between compiled and interpreted languages for development time versus performance: Do you want to develop the program using a compiled language so that it will execute more quickly when it runs, but spend more time writing it? Or do you want to write the program as a script so that you can get it running in the least amount of time, even at the cost of some execution speed?

It's also possible to combine the two approaches. Write a script as a "first draft" to quickly develop an application prototype to test out your logic and make sure the algorithms are appropriate. If the program proves useful and is executed frequently enough that performance becomes a concern, you can recode it as a compiled application. This gives you the best of both worlds: quick prototyping for initial development of the application, and the best performance for the final product.

In a strict sense, the Perl DBI and PHP APIs give you no capabilities that are not already present in the C client library. This is because both of those APIs gain access to MySQL by having the MySQL C library linked into the Perl and PHP interpreters. However, the environment in which MySQL capabilities are embedded is very different for C than for Perl or PHP. Consider what tasks you'll need to perform as you interact with the MySQL server and ask how much each API language will help you carry them out. Here are some examples:

Memory management. In C, you find yourself working with malloc() and free() for any tasks involving dynamically allocated data structures. Perl and PHP handle that for you. For example, they allow arrays to grow in size automatically, and dynamic-length strings can be used without ever thinking about memory management.
Text manipulation. Perl has the most highly developed capabilities in this area, and PHP runs a close second. C is very rudimentary by comparison, coming in a distant third.

Of course, in C you can write your own libraries to encapsulate tasks such as memory management and text processing into functions that make the job easier. But then you still have to debug them, and you want your algorithms to be efficient, too. In these respects, it's a fair bet that the algorithms in Perl and PHP for these things have had the benefit of being examined by many pairs of eyes, so generally they should be both well debugged and reasonably efficient. You can save your own time by taking advantage of the time that others have already put into the job. (On the other hand, if an interpreter does happen to have a bug, you may simply have to live with it or try to find a workaround until the problem is fixed. When you write in C, you have a finer level of control over the behavior of your program.)

The languages differ in how "safe" they are. The C API provides the lowest-level interface to the server and enforces the least policy. In this sense, it provides the least amount of safety net. If you execute API functions out of order, you may be lucky and get an "out-of-sync" error, or you may be unlucky and have your program crash. Perl and PHP both protect you pretty well. A script will fail if you don't do things in the proper order, but the interpreter won't crash. Another fertile source of crashing bugs in C programs is the use of dynamically allocated memory and pointers associated with them. Perl and PHP handle memory management for you, so your scripts are much less likely to die from memory management bugs.

Development time is affected by the amount of external support that is available for a language. C external support is available in the form of wrapper libraries that encapsulate MySQL C API functions into routines that are easier to use. Libraries that do this are available for both C and C++. Perl undoubtedly has the largest number of add-ons, in the form of Perl modules (these are similar in concept to Apache modules). There is even an infrastructure in place designed to make it easy to locate and obtain these modules (the CPAN, or Comprehensive Perl Archive Network). Using Perl modules, you gain access to all kinds of functions without writing a line of code. Want to write a script that generates a report from a database, and then mail it to someone as an attachment? Just visit cpan.perl.org, get one of the MIME modules, and you have instant attachment-generation capability. PHP doesn't have the same level of organized external support, although as of PHP 4 the situation is changing with the development of PEAR.

Portability

The question of portability has to do with how easily a program written to use MySQL can be modified to use a different database engine. This may be something you don't care about. However, unless you can predict the future, it might be a little risky to say, "I'll never use this program with any database other than MySQL." Suppose that you get a different job and want to use your old programs, but your new employer uses a different database system? What then? If portability is a priority, you should consider the clear differences between APIs:

DBI provides the most portable API because database independence is an explicit DBI design goal.
PHP script portability is similar to DBI if scripts use the PEAR DB database abstraction module mentioned earlier. Without PEAR DB, PHP is less portable because it doesn't provide the same sort of uniform interface to various database engines that DBI does. The PHP function calls for each supported database tend to resemble those in the corresponding underlying C API. There is some smoothing of differences, but at a minimum, you'll need to change the names of the database-related functions you invoke. You may also have to revise your application's logic a bit as well because the interfaces for the various databases don't all work quite the same way.
The C API provides the least portability between databases. By its very nature it is designed specifically for MySQL.

Portability in the form of database independence is especially important when you need to access multiple database systems within the same application. This can involve simple tasks such as moving data from one RDBMS to another, or more complex undertakings, such as generating a report based on information combined from a number of database systems.