9.4 Rolling Your Own Backup Script

There are always circumstances in which the standard tools aren't enough to get the job done. Perhaps they're not flexible enough, they're too slow, or they just don't work the way you'd like. The solution, of course, is to build your own tool. In doing so, you may decide to use the existing utilities or to just do your own thing.

Let's look at writing a simple MySQL backup script in Perl. While it isn't the most powerful or flexible script in the world, it can serve as a starting point for building a custom solution.

The script (mysnap.pl) solves the following problem. You have a MySQL server that keeps all its data on a volume with snapshot capabilities. Every 12 hours, you'd like to perform the following tasks to make a good snapshot and gather a list of tables and their sizes:

Flush and lock all MyISAM tables.
Assemble a list of every table and its size.
Initiate a snapshot.
Unlock the tables.
Output the list of table sizes.

The script's output can be captured and automatically mailed to a backup administrator. A cron entry like this does the job nicely if you're using Vixie cron (common on Linux and FreeBSD):

MAILTO=backup-admin@example.com

00 */12 * * * /usr/local/bin/mysnap.pl

Otherwise, you can use the more traditional format:

00 0,12 * * * /usr/local/bin/mysnap.pl | mail backup-admin@example.com

You'll find the complete script listed here.

#!/usr/bin/perl -w

#

# mysnap.pl - snapshot mysql and mail stats to backup admins



use strict;

use DBIx::DWIW;



$|=1;  # unbuffer output



my $db_user = 'backup_user';

my $db_pass = 'backup_pass';

my $db_name = 'mysql';

my $db_host = 'localhost';

my $command = '/usr/local/bin/snapshot';

my $conn    = DBIx::DWIW->Connect(DB => $db_name, User => $db_user,

                                  Pass => $db_pass, Host => $db_host);



my @table_sizes;



# flush and lock all tables

$conn->Execute("FLUSH TABLES WITH READ LOCK");



# gather stats on the tables

my @db_list = $conn->FlatArray("SHOW DATABASES");



for my $db (@db_list)

{

    $conn->Execute("USE $db") or die "$!";

    my @table_info = $conn->Hashes("SHOW TABLE STATUS");



    for my $table (@table_info)

    {

        my $name = $table->{Name};

        my $size = $table->{Data_length};

        push @table_sizes, ["$db.$name", $size];

    }

}



# run the snapshot

system($command);



# unlock the tables

$conn->Execute("UNLOCK TABLES");

$conn->Disconnect;



# sort by size and print

for my $info (sort { $b->[1] cmp $a->[1] } @table_sizes)

{

    printf "%-10s  %s\n", $info->[1], $info->[0];

}



exit;



_ _END_ _

Let's walk through the basic flow. The first thing to notice is that the script requires a module from CPAN. DBIx::DWIW simplifies most Perl work with MySQL.^[7] After using the necessary modules, define the necessary variables for the connection to MySQL. Then you execute a FLUSH TABLES WITH READ LOCK to make sure all changes are on disk and that no further changes will happen.

^[7] The DWIW stands for Do What I Want, a play on Perl's Do What I Mean.

Once the tables have all been flushed and locked, the script collects a list of all the databases on the server and iterates through them. In each database, the script gets the status of all the tables using SHOW TABLE STATUS, which produces records that look like this:

mysql> SHOW TABLE STATUS \G

*************************** 1. row ***************************

           Name: journal

           Type: MyISAM

     Row_format: Dynamic

           Rows: 417

 Avg_row_length: 553

    Data_length: 230848

Max_data_length: 4294967295

   Index_length: 5120

      Data_free: 0

 Auto_increment: NULL

    Create_time: 2001-12-09 23:18:06

    Update_time: 2002-06-16 22:20:13

     Check_time: 2002-05-19 17:03:35

 Create_options:

        Comment:

The script grabs the Name and Data_length fields for each table and stores them in the @table_sizes list. Once that data has been gathered, the script calls the snapshot command. Finally, it unlocks the tables and prints the list of tables and sizes (sorted by size).

Running mysnap.pl produces output like this:

$ mysnap.pl

9300388448  Datascope.SymbolHistory

1458868716  Chart.SymbolHistory

773481608   logs.pfs

749644404   IDX.LinkLog

457454228   SEC.SEC_Filings

442951712   IDX.BusinessWireArticles

343099968   Datascope.Symbols

208388096   IDX.Headlines

...

As expected, the largest tables are listed first—regardless of which databases they reside in.

There are many ways mysnap.pl can be improved or enhanced. It could:

Perform more error checking.
Compare the current table sizes with those from the previous run.
Notice whether a table has grown beyond a preset threshold.
Ignore Heap tables, since they don't reside on disk.

None of those enhancements are particularly difficult. With even a basic grasp of Perl and a bit of time, you can transform that script to something custom-tailored for your needs.

< Day Day Up >