I l@ve RuBoard Previous Section Next Section

11.2 Transferring Files over the Net

As we saw in the previous chapter, sockets see plenty of action on the Net. For instance, the getfile example at the end of that chapter allowed us to transfer entire files between machines. In practice, though, higher-level protocols are behind much of what happens on the Net. Protocols run on top of sockets, but hide much of the complexity of the network scripting examples we've just seen.

FTP -- the File Transfer Protocol -- is one of the more commonly used Internet protocols. It defines a higher-level conversation model that is based on exchanging command strings and file contents over sockets. By using FTP, we can accomplish the same task as the prior chapter's getfile script, but the interface is simpler, and standard -- FTP lets us ask for files from any server machine that supports FTP, without requiring that it run our custom getfile script. FTP also supports more advanced operations such as uploading files to the server, getting remote directory listings, and more.

Really, FTP runs on top of two sockets: one for passing control commands between client and server (port 21), and another for transferring bytes. By using a two-socket model, FTP avoids the possibility of deadlocks (i.e., transfers on the data socket do not block dialogs on the control socket). Ultimately, though, Python's ftplib support module allows us to upload and download files at a remote server machine by FTP, without dealing in raw socket calls or FTP protocol details.

11.2.1 FTP: Fetching Python with Python

Because the Python FTP interface is so easy to use, let's jump right into a realistic example. The script in Example 11-1 automatically fetches and builds Python with Python. No, this isn't a recursive chicken-and-egg thought exercise -- you must already have installed Python to run this program. More specifically, this Python script does the following:

  1. Downloads the Python source distribution by FTP

  2. Unpacks and compiles the distribution into a Python executable

The download portion will run on any machine with Python and sockets; the unpacking and compiling code assumes a Unix-like build environment as coded here, but could be tweaked to work with other platforms.

Example 11-1. PP2E\Internet\Ftp\getpython.py
#!/usr/local/bin/python
###############################################################
# A Python script to download and build Python's source code.
# Uses ftplib, the ftp protocol handler which uses sockets.
# Ftp runs on 2 sockets (one for data, one for control--on
# ports 20 and 21) and imposes message text formats, but the 
# Python ftplib module hides most of this protocol's details.
###############################################################

import os
from ftplib import FTP                   # socket-based ftp tools
Version = '1.5'                          # version to download
tarname = 'python%s.tar.gz' % Version    # remote/local file name
 
print 'Connecting...'
localfile  = open(tarname, 'wb')         # where to store download
connection = FTP('ftp.python.org')       # connect to ftp site
connection.login()                       # default is anonymous login
connection.cwd('pub/python/src')         # xfer 1k at a time to localfile

print 'Downloading...'
connection.retrbinary('RETR ' + tarname, localfile.write, 1024)
connection.quit()
localfile.close()
 
print 'Unpacking...'
os.system('gzip -d '  + tarname)         # decompress
os.system('tar -xvf ' + tarname[:-3])    # strip .gz 

print 'Building...'
os.chdir('Python-' + Version)            # build Python itself
os.system('./configure')                 # assumes unix-style make
os.system('make')
os.system('make test')
print 'Done: see Python-%s/python.' % Version

Most of the FTP protocol details are encapsulated by the Python ftplib module imported here. This script uses some of the simplest interfaces in ftplib (we'll see others in a moment), but they are representative of the module in general:

connection = FTP('ftp.python.org')       # connect to ftp site

To open a connection to a remote (or local) FTP server, create an instance of the ftplib.FTP object, passing in the name (domain or IP-style) of the machine you wish to connect to. Assuming this call doesn't throw an exception, the resulting FTP object exports methods that correspond to the usual FTP operations. In fact, Python scripts act much like typical FTP client programs -- just replace commands you would normally type or select with method calls:

connection.login()                       # default is anonymous login
connection.cwd('pub/python/src')         # xfer 1k at a time to localfile

Once connected, we log in, and go to the remote directory we want to fetch a file from. The login method allows us to pass in additional optional arguments to specify a username and password; by default it performs anonymous FTP:

connection.retrbinary('RETR ' + tarname, localfile.write, 1024)
connection.quit()

Once we're in the target directory, we simply call the retrbinary method to download the target server file in binary mode. The retrbinary call will take awhile to complete, since it must download a big file. It gets three arguments:

  • An FTP command string -- here, a string RETR filename, which is the standard format for FTP retrievals.

  • A function or method to which Python passes each chunk of the downloaded file's bytes -- here, the write method of a newly created and opened local file.

  • A size for those chunks of bytes -- here, 1024 bytes are downloaded at a time, but the default is reasonable if this argument is omitted.

Because this script creates a local file named localfile, of the same name as the remote file being fetched, and passes its write method to the FTP retrieval method, the remote file's contents will automatically appear in a local, client-side file after the download is finished. By the way, notice that this file is opened in "wb" binary output mode; if this script is run on Windows, we want to avoid automatically expanding and \n bytes into \r\n byte sequences (that happens automatically on Windows when writing files opened in "w" text mode).

Finally, we call the FTP quit method to break the connection with the server and manually close the local file to force it to be complete before it is further processed by the shell commands spawned by os.system (it's not impossible that parts of the file are still held in buffers before the close call):

connection.quit()
localfile.close()

And that's all there is to it; all the FTP, socket, and networking details are hidden behind the ftplib interface module. Here is this script in action on a Linux machine, with a couple thousand output lines cut in the interest of brevity:

[lutz@starship test]$ python getpython.py 
Connecting...
Downloading...
Unpacking...
Python-1.5/
Python-1.5/Doc/
Python-1.5/Doc/ref/
Python-1.5/Doc/ref/.cvsignore
Python-1.5/Doc/ref/fixps.py
...
 ...lots of tar lines deleted...
...
Python-1.5/Tools/webchecker/webchecker.py
Python-1.5/Tools/webchecker/websucker.py
Building...
creating cache ./config.cache
checking MACHDEP... linux2
checking CCC...
checking for --without-gcc... no
checking for gcc... gcc
...
 ...lots of build lines deleted...
...
Done: see Python-1.5/python.

[lutz@starship test]$ cd Python-1.5/ 
[lutz@starship Python-1.5]$ ./python 
Python 1.5 (#1, Jul 12 2000, 12:35:52)  [GCC egcs-2.91.66 19990314/Li on linux2
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> print 'The Larch!'
The Larch!

Such a script could be automatically executed at regular intervals (e.g., by a Unix cron job) to update a local Python install with a fresh build. But the thing to notice here is that this otherwise typical Python script fetches information from an arbitrarily remote FTP site and machine. Given an Internet link, any information published by an FTP server on the Net can be fetched by and incorporated into Python scripts using interfaces such as these.

11.2.1.1 Using urllib to FTP files

In fact, FTP is just one way to transfer information across the Net, and there are more general tools in the Python library to accomplish the prior script's download. Perhaps the most straightforward is the Python urllib module: given an Internet address string -- a URL, or Universal Resource Locator -- this module opens a connection to the specified server and returns a file-like object ready to be read with normal file object method calls (e.g., read, readlines).

We can use such a higher-level interface to download anything with an address on the Web -- files published by FTP sites (using URLs that start with "ftp://"), web pages and outputs of scripts that live on remote servers (using "http://" URLs), local files (using "file://" URLs), Gopher server data, and more. For instance, the script in Example 11-2 does the same as the one in Example 11-1, but it uses the general urllib module to fetch the source distribution file, instead of the protocol-specific ftplib.

Example 11-2. PP2E\Internet\Ftp\getpython-urllib.py
#!/usr/local/bin/python
###################################################################
# A Python script to download and build Python's source code
# use higher-level urllib instead of ftplib to fetch file
# urllib supports ftp, http, and gopher protocols, and local files
# urllib also allows downloads of html pages, images, text, etc.;
# see also Python html/xml parsers for web pages fetched by urllib;
###################################################################

import os
import urllib                           # socket-based web tools
Version = '1.5'                         # version to download
tarname = 'python%s.tar.gz' % Version   # remote/local file name
 
remoteaddr = 'ftp://ftp.python.org/pub/python/src/' + tarname
print 'Downloading', remoteaddr

# this works too:
# urllib.urlretrieve(remoteaddr, tarname)

remotefile = urllib.urlopen(remoteaddr)     # returns input file-like object
localfile  = open(tarname, 'wb')            # where to store data locally
localfile.write(remotefile.read())
localfile.close()
remotefile.close()
 
# the rest is the same
execfile('buildPython.py')

Don't sweat the details of the URL string used here; we'll talk much more about URLs in the next chapter. We'll also use urllib again in this and later chapters to fetch web pages, format generated URL strings, and get the output of remote scripts on the Web.[1] Technically speaking, urllib supports a variety of Internet protocols (HTTP, FTP, Gopher, and local files), is only used for reading remote objects (not writing or uploading them), and retrievals must generally be run in threads if blocking is a concern. But the basic interface shown in this script is straightforward. The call:

[1] For more urllib download examples, see the section on HTTP in this chapter. In bigger terms, tools like urllib.urlopen allow scripts to both download remote files and invoke programs that are located on a remote server machine. In Chapter 12, we'll also see that urllib includes tools for formatting (escaping) URL strings for safe transmission.

remotefile = urllib.urlopen(remoteaddr)     # returns input file-like object

contacts the server named in the remoteaddr URL string and returns a file-like object connected to its download stream (an FTP-based socket). Calling this file's read method pulls down the file's contents, which are written to a local client-side file. An even simpler interface:

urllib.urlretrieve(remoteaddr, tarname)

also does the work of opening a local file and writing the downloaded bytes into it -- things we do manually in the script as coded. This comes in handy if we mean to download a file, but is less useful if we want to process its data immediately.

Either way, the end result is the same: the desired server file shows up on the client machine. The remainder of the script -- unpacking and building -- is identical to the original version, so it's been moved to a reusable Python file run with the execfile built-in (recall that execfile runs a file as though its code were pasted into the place where the execfile appears). The script is shown in Example 11-3.

Example 11-3. PP2E\Internet\Ftp\buildPython.py
#!/usr/local/bin/python
###############################################################
# A Python script to build Python from its source code.
# Run me in directory where Python source distribution lives.
###############################################################

import os
Version = '1.5'                         # version to build
tarname = 'python%s.tar.gz' % Version   # remote/local file name
 
print 'Unpacking...'
os.system('gzip -d '  + tarname)        # decompress file
os.system('tar -xvf ' + tarname[:-3])   # untar without '.gz'

print 'Building...'
os.chdir('Python-' + Version)           # build Python itself
os.system('./configure')                # assumes unix-style make
os.system('make')
os.system('make test')
print 'Done: see Python-%s/python.' % Version

The output this time is almost identical to the output of Example 11-1, so I'll show only a few portions (the gzip message appears if you don't delete a tar file left by a run in the past):

[lutz@starship test]$ python getpython-urllib.py 
Downloading ftp://ftp.python.org/pub/python/src/python1.5.tar.gz
Unpacking...
gzip: python1.5.tar already exists; do you wish to overwrite (y or n)? y
 ...tar lines... 
Building...
 ...build lines...
Done: see Python-1.5/python.

[lutz@starship test]$ python buildPython.py 
Unpacking...
 ...tar and build lines...

In fact, although the original script is all top-level code that runs immediately and accomplishes only one task, there really are two potentially reusable activities within it: fetching a file and building Python from source. By splitting each part off into a module of its own, we can reuse its program logic in other contexts, which naturally leads us to the topic in the next section.

11.2.2 FTP get and put Utilities

Almost invariably, when I present the ftplib interfaces in Python classes, students ask why programmers need to supply the RETR string in the retrieval method. It's a good question -- the RETR string is the name of the download command in the FTP protocol, but ftplib is supposed to encapsulate that protocol. As we'll see in a moment, we have to supply an arguably odd STOR string for uploads as well. It's boilerplate code that you accept on faith once you see it, but that begs the question. You could always email Guido a proposed ftplib patch, but that's not really a good answer for beginning Python students.[2]

[2] This is one point in the class where I also usually threaten to write Guido's home phone number on the whiteboard. But that's generally an empty promise made just for comic effect. If you do want to discuss Python language issues, Guido's email address, as well as contact points for other Python core developers, are readily available on the Net. As someone who's gotten anonymous Python-related calls at home, I never do give out phone numbers (and dialing 1-800-Hi-Guido is only funny the first time).

A better answer is that there is no law against extending the standard library modules with higher-level interfaces of our own -- with just a few lines of reusable code, we can make the FTP interface look any way we want in Python. For instance, we could, once and for all, write utility modules that wrap the ftplib interfaces to hide the RETR string. If we place these utility modules in a directory on PYTHONPATH, they become just as accessible as ftplib itself, automatically reusable in any Python script we write in the future. Besides removing the RETR string requirement, a wrapper module could also make assumptions that simplify FTP operations into single function calls.

For instance, given a module that encapsulates and simplifies ftplib, our Python fetch-and-build script could be further reduced to the script shown in Example 11-4 -- essentially just a function call and file execution.

Example 11-4. PP2E\Internet\Ftp\getpython-modular.py
#!/usr/local/bin/python
################################################################
# A Python script to download and build Python's source code.
# Uses getfile.py, a utility module which encapsulates ftp step.
################################################################

import getfile
Version = '1.5'                         # version to download
tarname = 'python%s.tar.gz' % Version   # remote/local file name

# fetch with utility 
getfile.getfile(tarname, 'ftp.python.org', 'pub/python/src')

# rest is the same
execfile('buildPython.py')

Besides having a line count that is much more impressive to marketeers, the meat of this script has been split off into files for reuse elsewhere. If you ever need to download a file again, simply import an existing function rather than copying code with cut-and-paste editing. Changes in download operations would need to be made in only one file, not everywhere we've copied boilerplate code; getfile.getfile could even be changed to use urllib instead of ftplib without effecting any of its clients. It's good engineering.

11.2.2.1 Download utility

So just how would we go about writing such an FTP interface wrapper (he asks, knowingly)? Given the ftplib library module, wrapping downloads of a particular file in a particular directory is straightforward. Connected FTP objects support two download methods:

  • The retrbinary method downloads the requested file in binary mode, sending its bytes in chunks to a supplied function, without line-feed mapping. Typically, the supplied function is a write method of an open local file object, such that the bytes are placed in the local file on the client.

  • The retrlines method downloads the requested file in ASCII text mode, sending each line of text to a supplied function with all end-of-line characters stripped. Typically, the supplied function adds a \n newline (mapped appropriately for the client machine), and writes the line to a local file.

We will meet the retrlines method in a later example; the getfile utility module in Example 11-5 transfers in binary mode always with retrbinary. That is, files are downloaded exactly as they were on the server, byte for byte, with the server's line-feed conventions in text files. You may need to convert line-feeds after downloads if they look odd in your text editor -- see the converter tools in Chapter 5, for pointers.

Example 11-5. PP2E\Internet\Ftp\getfile.py
#!/usr/local/bin/python
################################################# 
# Fetch an arbitrary file by ftp.  Anonymous 
# ftp unless you pass a user=(name, pswd) tuple.
# Gets the Monty Python theme song by default.
#################################################
 
from ftplib  import FTP          # socket-based ftp tools
from os.path import exists       # file existence test

file = 'sousa.au'                # default file coordinates
site = 'ftp.python.org'          # monty python theme song
dir  = 'pub/python/misc'

def getfile(file=file, site=site, dir=dir, user=(), verbose=1, force=0):
    """
    fetch a file by ftp from a site/directory
    anonymous or real login, binary transfer
    """
    if exists(file) and not force:
        if verbose: print file, 'already fetched'
    else:
        if verbose: print 'Downloading', file
        local = open(file, 'wb')                # local file of same name
        try:
            remote = FTP(site)                  # connect to ftp site
            apply(remote.login, user)           # anonymous=() or (name, pswd)
            remote.cwd(dir)
            remote.retrbinary('RETR ' + file, local.write, 1024)
            remote.quit()
        finally:
            local.close()                       # close file no matter what
        if verbose: print 'Download done.'      # caller handles exceptions

if __name__ == '__main__': getfile()            # anonymous python.org login

This module is mostly just a repackaging of the FTP code we used to fetch the Python source distribution earlier, to make it simpler and reusable. Because it is a callable function, the exported getfile.getfile here tries to be as robust and generally useful as possible, but even a function this small implies some design decisions. Here are a few usage notes:

FTP mode

The getfile function in this script runs in anonymous FTP mode by default, but a two-item tuple containing a username and password string may be passed to the user argument to log in to the remote server in non-anonymous mode. To use anonymous FTP, either don't pass the user argument or pass it an empty tuple, (). The FTP object login method allows two optional arguments to denote a username and password, and the apply call in Example 11-5 sends it whatever argument tuple you pass to user.

Processing modes

If passed, the last two arguments (verbose, force) allow us to turn off status messages printed to the stdout stream (perhaps undesirable in a GUI context) and force downloads to happen even if the file already exists locally (the download overwrites the existing local file).

Exception protocol

The caller is expected to handle exceptions; this function wraps downloads in a try/finally statement to guarantee that the local output file is closed, but lets exceptions propagate. If used in a GUI or run from a thread, for instance, exceptions may require special handling unknown in this file.

Self-test

If run standalone, this file downloads a sousa.au audio file from http://www.python.org as a self-test, but the function will normally be passed FTP filenames, site names, and directory names as well.

File mode

This script is careful to open the local output file in "wb" binary mode to suppress end-line mapping, in case it is run on Windows. As we learned in Chapter 2, it's not impossible that true binary data files may have bytes whose value is equal to a \n line-feed character; opening in "w" text mode instead would make these bytes be automatically expanded to a \r\n two-byte sequence when written locally on Windows. This is only an issue for portability to Windows (mode "w" works elsewhere). Again, see Chapter 5 for line-feed converter tools.

Directory model

This function currently uses the same filename to identify both the remote file and the local file where the download should be stored. As such, it should be run in the directory where you want the file to show up; use os.chdir to move to directories if needed. (We could instead assume filename is the local file's name, and strip the local directory with os.path.split to get the remote name, or accept two distinct filename arguments -- local and remote.)

Notice also that, despite its name, this module is very different than the getfile.py script we studied at the end of the sockets material in the previous chapter. The socket-based getfile implemented client and server-side logic to download a server file to a client machine over raw sockets.

This new getfile here is a client-side tool only. Instead of raw sockets, it uses the simpler FTP protocol to request a file from a server; all socket-level details are hidden in the ftplib module's implementation of the FTP client protocol. Furthermore, the server here is a perpetually running program on the server machine, which listens for and responds to FTP requests on a socket, on the dedicated FTP port (number 21). The net functional effect is that this script requires an FTP server to be running on the machine where the desired file lives, but such a server is much more likely to be available.

11.2.2.2 Upload utility

While we're at it, let's write a script to upload a single file by FTP to a remote machine. The upload interfaces in the FTP module are symmetric with the download interfaces. Given a connected FTP object:

  • Its storbinary method can be used to upload bytes from an open local file object.

  • Its storlines method can be used to upload text in ASCII mode from an open local file object.

Unlike the download interfaces, both of these methods are passed a file object as a whole, not a file object method (or other function). We will meet the storlines method in a later example. The utility module in Example 11-6 uses storbinary such that the file whose name is passed in is always uploaded verbatim -- in binary mode, without line-feed translations for the target machine's conventions. If this script uploads a text file, it will arrive exactly as stored on the machine it came from, client line-feed markers and all.

Example 11-6. PP2E\Internet\Ftp\putfile.py
#!/usr/local/bin/python
################################################## 
# Store an arbitrary file by ftp.  Anonymous 
# ftp unless you pass a user=(name, pswd) tuple.
##################################################
 
import ftplib                    # socket-based ftp tools

file = 'sousa.au'                # default file coordinates
site = 'starship.python.net'     # monty python theme song
dir  = 'upload'

def putfile(file=file, site=site, dir=dir, user=(), verbose=1):
    """
    store a file by ftp to a site/directory
    anonymous or real login, binary transfer
    """
    if verbose: print 'Uploading', file
    local  = open(file, 'rb')               # local file of same name
    remote = ftplib.FTP(site)               # connect to ftp site
    apply(remote.login, user)               # anonymous or real login
    remote.cwd(dir)
    remote.storbinary('STOR ' + file, local, 1024)
    remote.quit()
    local.close()
    if verbose: print 'Upload done.'

if __name__ == '__main__':
    import sys, getpass
    pswd = getpass.getpass(site + ' pswd?')          # filename on cmdline
    putfile(file=sys.argv[1], user=('lutz', pswd))   # non-anonymous login

Notice that for portability, the local file is opened in "rb" binary mode this time to suppress automatic line-feed character conversions in case this is run on Windows; if this is binary information, we don't want any bytes that happen to have the value of the \r carriage-return character to mysteriously go away during the transfer.

Also observe that the standard Python getpass.getpass is used to ask for an FTP password in standalone mode. Like the raw_input built-in function, this call prompts for and reads a line of text from the console user; unlike raw_input, getpass does not echo typed characters on the screen at all (in fact, on Windows it uses the low-level direct keyboard interface we met in the stream redirection section of Chapter 2). This comes in handy for protecting things like passwords from potentially prying eyes.

Like the download utility, this script uploads a local copy of an audio file by default as a self-test, but you will normally pass in real remote filename, site name, and directory name strings. Also like the download utility, you may pass a (username, password) tuple to the user argument to trigger non-anonymous FTP mode (anonymous FTP is the default).

11.2.2.3 Playing the Monty Python theme song

Wake up -- it's time for a bit of fun. Let's make use of these scripts to transfer and play the Monty Python theme song audio file maintained at Python's web site. First off, let's write a module that downloads and plays the sample file, as shown in Example 11-7.

Example 11-7. PP2E\Internet\Ftp\sousa.py
#!/usr/local/bin/python
################################################# 
# Usage: % sousa.py
# Fetch and play the Monty Python theme song.
# This may not work on your system as is: it 
# requires a machine with ftp access, and uses
# audio filters on Unix and your .au player on 
# Windows.  Configure playfile.py as needed.
#################################################
 
import os, sys
from PP2E.Internet.Ftp.getfile  import getfile
from PP2E.Internet.Ftp.playfile import playfile
sample = 'sousa.au'

getfile(sample)    # fetch audio file by ftp
playfile(sample)   # send it to audio player

This script will run on any machine with Python, an Internet link, and a recognizable audio player; it works on my Windows laptop with a dialup Internet connection (if I could insert an audio file hyperlink here to show what it sounds like, I would):

C:\...\PP2E\Internet\Ftp>python sousa.py
Downloading sousa.au
Download done.

C:\...\PP2E\Internet\Ftp>python sousa.py
sousa.au already fetched

The getfile and putfile modules can be used to move the sample file around, too. Both can either be imported by clients that wish to use their functions, or run as top-level programs to trigger self-tests. Let's run these scripts from a command line and the interactive prompt to see how they work. When run standalone, parameters are passed in the command line, and the default file settings are used:

C:\...\PP2E\Internet\Ftp>python putfile.py sousa.au
starship.python.net pswd?
Uploading sousa.au
Upload done.

When imported, parameters are passed explicitly to functions:

C:\...\PP2E\Internet\Ftp>python
>>> from getfile import getfile
>>> getfile(file='sousa.au', site='starship.python.net', dir='upload',
...                          user=('lutz', '****'))
Downloading sousa.au
Download done.
>>> from playfile import playfile
>>> playfile('sousa.au')

I've left one piece out of the puzzle: all that's left is to write a module that attempts to play an audio file portably (see Example 11-8). Alas, this is the least straightforward task because audio players vary per platform. On Windows, the following module uses the DOS start command to launch whatever you have registered to play audio files (exactly as if you had double-clicked on the file's icon in a file explorer); on the Windows 98 side of my Sony notebook machine, this DOS command line:

C:\...\PP2E\Internet\Ftp>python playfile.py sousa.au

pops up a media bar playing the sample. On Unix, it attempts to pass the audio file to a command-line player program, if one has been added to the unixfilter table -- tweak this for your system (cat 'ing audio files to /dev/audio works on some Unix systems, too). On other platforms, you'll need to do a bit more; there has been some work towards portable audio interfaces in Python, but it's notoriously platform-specific. Web browsers generally know how to play audio files, so passing the filename in a URL to a browser located via the LaunchBrowser.py script we met in Chapter 4, is perhaps a portable solution here as well (see that chapter for interface details).

Example 11-8. PP2E\Internet\Ftp\playfile.py
#!/usr/local/bin/python
################################################# 
# Try to play an arbitrary audio file.
# This may not work on your system as is; it
# uses audio filters on Unix, and filename 
# associations on Windows via the start command
# line (i.e., whatever you have on your machine 
# to run *.au files--an audio player, or perhaps
# a web browser); configure me as needed.  We
# could instead launch a web browser here, with 
# LaunchBrowser.py.  See also: Lib/audiodev.py.
#################################################

import os, sys
sample = 'sousa.au'  # default audio file

unixhelpmsg = """
Sorry: can't find an audio filter for your system!
Add an entry for your system to the "unixfilter" 
dictionary in playfile.py, or play the file manually.
"""

unixfilter = {'sunos5':  '/usr/bin/audioplay',
              'linux2':  '<unknown>',
              'sunos4':  '/usr/demo/SOUND/play'}

def playfile(sample=sample):
    """
    play an audio file: use name associations 
    on windows, filter command-lines elsewhere
    """
    if sys.platform[:3] == 'win':
        os.system('start ' + sample)   # runs your audio player
    else:
        if not (unixfilter.has_key(sys.platform) and 
                os.path.exists(unixfilter[sys.platform])):
            print unixhelpmsg
        else:
            theme = open(sample, 'r')             
            audio = os.popen(unixfilter[sys.platform], 'w')  # spawn shell tool
            audio.write(theme.read())                        # send to its stdin

if __name__ == '__main__': playfile()
11.2.2.4 Adding user interfaces

If you read the last chapter, you'll recall that it concluded with a quick look at scripts that added a user interface to a socket-based getfile script -- one that transferred files over a proprietary socket dialog, instead of FTP. At the end of that presentation, I mentioned that FTP is a much more generally useful way to move files around, because FTP servers are so widely available on the Net. For illustration purposes, Example 11-9 shows a simple mutation of the last chapter's user interface, implemented as a new subclass of the last chapter's general form builder.

Example 11-9. P2E\Internet\Ftp\getfilegui.py
###############################################################
# launch ftp getfile function with a reusable form gui class;
# uses os.chdir to goto target local dir (getfile currently
# assumes that filename has no local directory path prefix);
# runs getfile.getfile in thread to allow more than one to be 
# running at once and avoid blocking gui during downloads;
# this differs from socket-based getfilegui, but reuses Form;
# supports both user and anonymous ftp as currently coded;
# caveats: the password field is not displayed as stars here,
# errors are printed to the console instead of shown in the 
# gui (threads can't touch the gui on Windows), this isn't 
# 100% thread safe (there is a slight delay between os.chdir
# here and opening the local output file in getfile) and we 
# could display both a save-as popup for picking the local dir,
# and a remote directory listings for picking the file to get;
###############################################################

from Tkinter import Tk, mainloop
from tkMessageBox import showinfo
import getfile, os, sys, thread                 # ftp getfile here, not socket
from PP2E.Internet.Sockets.form import Form     # reuse form tool in socket dir

class FtpForm(Form):
    def __init__(self):
        root = Tk()
        root.title(self.title)
        labels = ['Server Name', 'Remote Dir', 'File Name', 
                  'Local Dir',   'User Name?', 'Password?']
        Form.__init__(self, labels, root)
        self.mutex = thread.allocate_lock()
        self.threads = 0
    def transfer(self, filename, servername, remotedir, userinfo):
        try:
            self.do_transfer(filename, servername, remotedir, userinfo)
            print '%s of "%s" successful'  % (self.mode, filename)
        except:
            print '%s of "%s" has failed:' % (self.mode, filename),
            print sys.exc_info()[0], sys.exc_info()[1]
        self.mutex.acquire()
        self.threads = self.threads - 1
        self.mutex.release()
    def onSubmit(self):
        Form.onSubmit(self)
        localdir   = self.content['Local Dir'].get()
        remotedir  = self.content['Remote Dir'].get()
        servername = self.content['Server Name'].get()
        filename   = self.content['File Name'].get()
        username   = self.content['User Name?'].get()
        password   = self.content['Password?'].get()
        userinfo   = ()
        if username and password:
            userinfo = (username, password)
        if localdir:
            os.chdir(localdir)
        self.mutex.acquire()
        self.threads = self.threads + 1
        self.mutex.release()
        ftpargs = (filename, servername, remotedir, userinfo)
        thread.start_new_thread(self.transfer, ftpargs)
        showinfo(self.title, '%s of "%s" started' % (self.mode, filename))
    def onCancel(self):
        if self.threads == 0:
            Tk().quit()
        else:
            showinfo(self.title, 
                     'Cannot exit: %d threads running' % self.threads)

class FtpGetfileForm(FtpForm):
    title = 'FtpGetfileGui'
    mode  = 'Download'
    def do_transfer(self, filename, servername, remotedir, userinfo):
        getfile.getfile(filename, servername, remotedir, userinfo, 0, 1)

if __name__ == '__main__':
    FtpGetfileForm()
    mainloop()    

If you flip back to the end of the previous chapter, you'll find that this version is similar in structure to its counterpart there; in fact, it has the same name (and is distinct only because it lives in a different directory). The class here, though, knows how to use the FTP-based getfile module from earlier in this chapter, instead of the socket-based getfile module we met a chapter ago. When run, this version also implements more input fields, as we see in Figure 11-1.

Figure 11-1. FTP getfile input form
figs/ppy2_1101.gif

Notice that a full file path is entered for the local directory here. Otherwise, the script assumes the current working directory, which changes after each download and can vary depending on where the GUI is launched (e.g., the current directory differs when this script is run by the PyDemos program at the top of the examples tree). When we click this GUI's Submit button (or press the Enter key), this script simply passes the form's input field values as arguments to the getfile.getfile FTP utility function shown earlier in this section. It also posts a pop-up to tell us the download has begun (Figure 11-2).

Figure 11-2. FTP getfile info pop-up
figs/ppy2_1102.gif

As currently coded, further download status messages from this point on show up in the console window; here are the messages for a successful download, as well as one that failed when I mistyped my password (no, it's not really "xxxxxx"):

User Name?      =>      lutz
Server Name     =>      starship.python.net
Local Dir       =>      c:\temp
Password?       =>      xxxxxx
File Name       =>      index.html
Remote Dir      =>      public_html/home
Download of "index.html" successful

User Name?      =>      lutz
Server Name     =>      starship.python.net
Local Dir       =>      c:\temp
Password?       =>      xxxxxx
File Name       =>      index.html
Remote Dir      =>      public_html/home
Download of "index.html" has failed: ftplib.error_perm 530 Login incorrect.

Given a username and password, the downloader logs into the specified account. To do anonymous FTP instead, leave the username and password fields blank. Let's start an anonymous FTP connection to fetch the Python source distribution; Figure 11-3 shows the filled-out form.

Figure 11-3. FTP getfile input form, anonymous FTP
figs/ppy2_1103.gif

Pressing Submit on this form starts a download running in the background as before; we get the pop-up shown in Figure 11-4 to verify the startup.

Figure 11-4. FTP getfile info pop-up
figs/ppy2_1104.gif

Now, to illustrate the threading capabilities of this GUI, let's start another download while this one is in progress. The GUI stays active while downloads are under way, so we simply change the input fields and press Submit again, as done in Figure 11-5.

Figure 11-5. FTP getfile input form, second thread
figs/ppy2_1105.gif

This second download starts in parallel with the one attached to ftp.python.org, because each download is run in a thread, and more than one Internet connection can be active at once. In fact, the GUI itself stays active during downloads only because downloads are run in threads; if they were not, even screen redraws wouldn't happen until a download finished.

We discussed threads in Chapter 3, but this script illustrates some practical thread concerns:

  • This program takes care to not do anything GUI-related in a download thread. At least in the current release on Windows, only the thread that makes GUIs can process them (a Windows-only rule that has nothing to do with Python or Tkinter).

  • To avoid killing spawned download threads on some platforms, the GUI must also be careful to not exit while any downloads are in progress. It keeps track of the number of in-progress threads, and just displays the pop-up in Figure 11-6 if we try to kill the GUI while both of these downloads are in progress by pressing the Cancel button.

Figure 11-6. FTP getfile busy pop-up
figs/ppy2_1106.gif

We'll see ways to work around the no-GUI rule for threads when we explore the PyMailGui example near the end of this chapter. To be portable, though, we can't really close the GUI until the active-thread count falls to zero. Here is the sort of output that appears in the console window for these two downloads:

C:\...\PP2E\Internet\Ftp>python getfilegui.py
User Name?      =>
Server Name     =>      ftp.python.org
Local Dir       =>      c:\temp
Password?       =>
File Name       =>      python1.5.tar.gz
Remote Dir      =>      pub/python/src

User Name?      =>      lutz
Server Name     =>      starship.python.net
Local Dir       =>      c:\temp
Password?       =>      xxxxxx
File Name       =>      about-pp.html
Remote Dir      =>      public_html/home
Download of "about-pp.html" successful
Download of "python1.5.tar.gz" successful

This all isn't much more useful than a command-line-based tool, of course, but it can be easily modified by changing its Python code, and it provides enough of a GUI to qualify as a simple, first-cut FTP user interface. Moreover, because this GUI runs downloads in Python threads, more than one can be run at the same time from this GUI without having to start or restart a different FTP client tool.

While we're in a GUI mood, let's add a simple interface to the putfile utility, too. The script in Example 11-10 creates a dialog that starts uploads in threads. It's almost the same as the getfile GUI we just wrote, so there's nothing new to say. In fact, because get and put operations are so similar from an interface perspective, most of the get form's logic was deliberately factored out into a single generic class (FtpForm) such that changes need only be made in a single place. That is, the put GUI here is mostly just a reuse of the get GUI, with distinct output labels and transfer method. It's in a file by itself to make it easy to launch as a standalone program.

Example 11-10. PP2E\Internet\Ftp\putfilegui.py
###############################################################
# launch ftp putfile function with a reusable form gui class;
# see getfilegui for notes: most of the same caveats apply;
# the get and put forms have been factored into a single 
# class such that changes need only be made in one place;
###############################################################

from Tkinter import mainloop
import putfile, getfilegui

class FtpPutfileForm(getfilegui.FtpForm):
    title = 'FtpPutfileGui'
    mode  = 'Upload'
    def do_transfer(self, filename, servername, remotedir, userinfo):
        putfile.putfile(filename, servername, remotedir, userinfo, 0)

if __name__ == '__main__':
    FtpPutfileForm()
    mainloop()    

Running this script looks much like running the download GUI, because it's almost entirely the same code at work. Let's upload a couple of files from the client machine to the starship server; Figure 11-7 shows the state of the GUI while starting one.

Figure 11-7. FTP putfile input form
figs/ppy2_1107.gif

And here is the console window output we get when uploading two files in parallel; here again, uploads run in threads, so if we start a new upload before one in progress is finished, they overlap in time:

User Name?      =>      lutz
Server Name     =>      starship.python.net
Local Dir       =>      c:\stuff\website\public_html
Password?       =>      xxxxxx
File Name       =>      about-pp2e.html
Remote Dir      =>      public_html

User Name?      =>      lutz
Server Name     =>      starship.python.net
Local Dir       =>      c:\stuff\website\public_html
Password?       =>      xxxxxx
File Name       =>      about-ppr2e.html
Remote Dir      =>      public_html
Upload of "about-pp2e.html" successful
Upload of "about-ppr2e.html" successful

Finally, we can bundle up both GUIs in a single launcher script that knows how to start the get and put interfaces, regardless of which directory we are in when the script is started, and independent of the platform on which it runs. Example 11-11 shows this process.

Example 11-11. PP2E\Internet\Ftp\PyFtpGui.pyw
################################################################
# spawn ftp get and put guis no matter what dir I'm run from;
# os.getcwd is not necessarily the place this script lives;
# could also hard-code a path from $PP2EHOME, or guessLocation;
# could also do this but need the DOS popup for status messages:
# from PP2E.launchmodes import PortableLauncher
# PortableLauncher('getfilegui', '%s/getfilegui.py' % mydir)()
################################################################

import os, sys
from PP2E.Launcher import findFirst
mydir = os.path.split(findFirst(os.curdir, 'PyFtpGui.pyw'))[0]

if sys.platform[:3] == 'win':
    os.system('start %s/getfilegui.py' % mydir)
    os.system('start %s/putfilegui.py' % mydir)
else:
    os.system('python %s/getfilegui.py &' % mydir)
    os.system('python %s/putfilegui.py &' % mydir)

When this script is started, both the get and put GUIs appear as distinct, independently running programs; alternatively, we might attach both forms to a single interface. We could get much fancier than these two interfaces, of course. For instance, we could pop up local file selection dialogs, and we could display widgets that give status of downloads and uploads in progress. We could even list files available at the remote site in a selectable list box by requesting remote directory listings over the FTP connection. To learn how to add features like that, though, we need to move on to the next section.

11.2.3 Downloading Web Sites (Mirrors)

Once upon a time, Telnet was all I needed. My web site lived at an Internet Service Provider (ISP) that provided general and free Telnet access for all its customers. It was a simple time. All of my site's files lived only in one place -- at my account directory on my ISP's server machine. To make changes to web pages, I simply started a Telnet session connected to my ISP's machine and edited my web pages there online. Moreover, because Telnet sessions can be run from almost any machine with an Internet link, I was able to tweak my web pages everywhere -- from my PC, from machines I had access to on the training road, from archaic machines I played with when I was bored at my day job, and so on. Life was good.

But times have changed. Due to a security breach, my ISP made a blanket decision to revoke Telnet access from all of their customers (except, of course, those who elected to pay a substantial premium to retain it). Seemingly, we weren't even supposed to have known about Telnet in the first place. As a replacement, the ISP mandated that all Telnet-inclined users should begin maintaining web page files locally on their own machines, and upload them by FTP after every change.

That's nowhere near as nice as editing files kept in a single place from almost any computer on the planet, of course, and this triggered plenty of complaints and cancellations among the technically savvy. Unfortunately, the technically savvy is a financially insignificant subset; more to the point, my web page's address had by this time been published in multiple books sold around the world, so changing ISPs would have been no less painful than changing update procedures.

After the shouting, it dawned on me that Python could help here: by writing Python scripts to automate the upload and download tasks associated with maintaining my web site on my PC, I could at least get back some of the mobility and ease of use that I'd lost. Because Python FTP scripts will work on any machine with sockets, I could run them both on my PC and on nearly any other computer where Python was installed. Furthermore, the same scripts used to transfer page files to and from my PC could be used to copy ("mirror") my site to another web server as a backup copy, should my ISP experience an outage (trust me -- it happens).

The following two scripts were born of all of the above frustrations. The first, mirrorflat.py, automatically downloads (i.e., copies) by FTP all the files in a directory at a remote site, to a directory on the local machine. I keep the main copy of my web site files on my PC these days, but really use this script in two ways:

  • To download my web site to client machines where I want to make edits, I fetch the contents of my public_html web directory of my account on my ISP's machine.

  • To mirror my site to my account on the starship.python.net server, I run this script periodically from a Telnet session on the starship machine (as I wrote this, starship still clung to the radical notion that users are intelligent enough to run Telnet).

More generally, this script (shown in Example 11-12) will download a directory full of files to any machine with Python and sockets, from any machine running an FTP server.

Example 11-12. PP2E\Internet\Ftp\mirrorflat.py
#!/bin/env python 
###########################################################
# use ftp to copy (download) all files from a remote site
# and directory to a directory on the local machine; e.g., 
# run me periodically to mirror a flat ftp site;
###########################################################

import os, sys, ftplib
from getpass import getpass

remotesite = 'home.rmi.net'
remotedir  = 'public_html'
remoteuser = 'lutz'
remotepass = getpass('Please enter password for %s: ' % remotesite)
localdir   = (len(sys.argv) > 1 and sys.argv[1]) or '.'
if sys.platform[:3] == 'win': raw_input() # clear stream
cleanall   = raw_input('Clean local directory first? ')[:1] in ['y', 'Y']

print 'connecting...'
connection = ftplib.FTP(remotesite)                 # connect to ftp site
connection.login(remoteuser, remotepass)            # login as user/password
connection.cwd(remotedir)                           # cd to directory to copy

if cleanall:
    for localname in os.listdir(localdir):          # try to delete all locals
        try:                                        # first to remove old files
            print 'deleting local', localname
            os.remove(os.path.join(localdir, localname))
        except:
            print 'cannot delete local', localname

count = 0                                           # download all remote files
remotefiles = connection.nlst()                     # nlst() gives files list
                                                    # dir()  gives full details
for remotename in remotefiles:
    localname = os.path.join(localdir, remotename) 
    print 'copying', remotename, 'to', localname
    if remotename[-4:] == 'html' or remotename[-3:] == 'txt':
        # use ascii mode xfer
        localfile = open(localname, 'w')
        callback  = lambda line, file=localfile: file.write(line + '\n')
        connection.retrlines('RETR ' + remotename, callback)
    else:
        # use binary mode xfer
        localfile = open(localname, 'wb')
        connection.retrbinary('RETR ' + remotename, localfile.write)
    localfile.close()
    count = count+1

connection.quit()
print 'Done:', count, 'files downloaded.'

There is not a whole lot new to speak of in this script, compared to other FTP examples we've seen thus far. We open a connection with the remote FTP server, log in with a username and password for the desired account (this script never uses anonymous FTP), and go to the desired remote directory. New here, though, are loops to iterate over all the files in local and remote directories, text-based retrievals, and file deletions:

Deleting all local files

This script has a cleanall option, enabled by interactive prompt. If selected, the script first deletes all the files in the local directory before downloading, to make sure there are no extra files there that aren't also on the server (there may be junk here from a prior download). To delete local files, the script calls os.listdir to get a list of filenames in the directory, and os.remove to delete each; see Chapter 2 earlier in this book (or the Python library manual) for more details if you've forgotten what these calls do.

Notice the use of os.path.join to concatenate a directory path and filename according to the host platform's conventions; os.listdir returns filenames without their directory paths, and this script is not necessarily run in the local directory where downloads will be placed. The local directory defaults to the current directory ("."), but can be set differently with a command-line argument to the script.

Fetching all remote files

To grab all the files in a remote directory, we first need a list of their names. The FTP object's nlst method is the remote equivalent of os.listdir: nlist returns a list of the string names of all files in the current remote directory. Once we have this list, we simply step through it in a loop, running FTP retrieval commands for each filename in turn (more on this in a minute).

The nlst method is, more or less, like requesting a directory listing with an ls command in typical interactive FTP programs, but Python automatically splits up the listing's text into a list of filenames. We can pass it a remote directory to be listed; by default it lists the current server directory. A related FTP method, dir, returns the list of line strings produced by an FTP LIST command; its result is like typing a dir command in an FTP session, and its lines contain complete file information, unlike nlst. If you need to know more about all the remote files, parse the result of a dir method call.

Text-based retrievals

To keep line-feeds in sync with the machines that my web files live on, this script distinguishes between binary and text files. It uses a simple heuristic to do so: filenames ending in .html or .txt are assumed to be ASCII text data (HTML web pages and simple text files), and all others are assumed to be binary files (e.g., GIF and JPEG images, audio files, tar archives). This simple rule won't work for every web site, but it does the trick at mine.

Binary files are pulled down with the retrbinary method we met earlier and a local open mode of "wb" to suppress line-feed byte mapping (this script may be run on Windows or Unix-like platforms). We don't use a chunk size third argument here, though -- it defaults to a reasonable 8K if omitted.

For ASCII text files, the script instead uses the retrlines method, passing in a function to be called for each line in the text file downloaded. The text line handler function mostly just writes the line to a local file. But notice that the handler function created by the lambda here also adds an \n newline character to the end of the line it is passed. Python's retrlines method strips all line-feed characters from lines to side-step platform differences. By adding an \n, the script is sure to add the proper line-feed marker character sequence for the local platform on which this script runs (\n or \r\n). For this automapping of the \n in the script to work, of course, we must also open text output files in "w" text mode, not "wb" -- the mapping from \n to \r\n on Windows happens when data is written to the file.

All of this is simpler in action than in words. Here is the command I use to download my entire web site from my ISP server account to my Windows 98 laptop PC, in a single step:

C:\Stuff\Website\public_html>python %X%\internet\ftp\mirrorflat.py 
Please enter password for home.rmi.net:
Clean local directory first?
connecting...
copying UPDATES to .\UPDATES
copying PythonPowered.gif to .\PythonPowered.gif
copying Pywin.gif to .\Pywin.gif
copying PythonPoweredAnim.gif to .\PythonPoweredAnim.gif
copying PythonPoweredSmall.gif to .\PythonPoweredSmall.gif
copying about-hopl.html to .\about-hopl.html
copying about-lp.html to .\about-lp.html
...
 ...lines deleted...
...
copying training.html to .\training.html
copying trainingCD.GIF to .\trainingCD.GIF
copying uk-1.jpg to .\uk-1.jpg
copying uk-2.jpg to .\uk-2.jpg
copying uk-3.jpg to .\uk-3.jpg
copying whatsnew.html to .\whatsnew.html
copying whatsold.html to .\whatsold.html
copying xlate-lp.html to .\xlate-lp.html
copying uploadflat.py to .\uploadflat.py
copying ora-lp-france.gif to .\ora-lp-france.gif
Done: 130 files downloaded.

This can take awhile to complete (it's bound by network speed constraints), but it is much more accurate and easy than downloading files by hand. The script simply iterates over all the remote files returned by the nlst method, and downloads each with the FTP protocol (i.e., over sockets) in turn. It uses text transfer mode for names that imply obviously text data, and binary mode for others.

With the script running this way, I make sure the initial assignments in it reflect the machines involved, and then run the script from the local directory where I want the site copy to be stored. Because the download directory is usually not where the script lives, I need to give Python the full path to the script file (%X% evaluates a shell variable containing the top-level path to book examples on my machine). When run on the starship server in a Telnet session window, the execution and script directory paths are different, but the script works the same way.

If you elect to delete local files in the download directory, you may also see a batch of "deleting local..." messages scroll by on the screen before any "copying..." lines appear:

...
deleting local uploadflat.py
deleting local whatsnew.html
deleting local whatsold.html
deleting local xlate-lp.html
deleting local old-book.html
deleting local about-pp2e.html
deleting local about-ppr2e.html
deleting local old-book2.html
deleting local mirrorflat.py
...
copying about-pp-japan.html to ./about-pp-japan.html
copying about-pp.html to ./about-pp.html
copying about-ppr-germany.html to ./about-ppr-germany.html
copying about-ppr-japan.html to ./about-ppr-japan.html
copying about-ppr-toc.html to ./about-ppr-toc.html
...

By the way, if you botch the input of the remote site password, a Python exception is raised; I sometimes need to run again (and type slower):

C:\Stuff\Website\public_html>python %X%\internet\ftp\mirrorflat.py
Please enter password for home.rmi.net:
Clean local directory first?
connecting...
Traceback (innermost last):
  File "C:\PP2ndEd\examples\PP2E\internet\ftp\mirrorflat.py", line 20, in ?
    connection.login(remoteuser, remotepass)            # login as user/pass..
  File "C:\Program Files\Python\Lib\ftplib.py", line 316, in login
    if resp[0] == '3': resp = self.sendcmd('PASS ' + passwd)
  File "C:\Program Files\Python\Lib\ftplib.py", line 228, in sendcmd
    return self.getresp()
  File "C:\Program Files\Python\Lib\ftplib.py", line 201, in getresp
    raise error_perm, resp
ftplib.error_perm: 530 Login incorrect.

It's worth noting that this script is at least partially configured by assignments near the top of the file. In addition, the password and deletion options are given by interactive inputs, and one command-line argument is allowed -- the local directory name to store the downloaded files (it defaults to ".", the directory where the script is run). Command-line arguments could be employed to universally configure all the other download parameters and options, too; but because of Python's simplicity and lack of compile/link steps, changing settings in the text of Python scripts is usually just as easy as typing words on a command line.

Windows input note : If you study the previous code closely, you'll notice that an extra raw_input call is made on Windows only, after the getpass password input call and before the cleanall option setting is input. This is a workaround for what seems like a bug in Python 1.5.2 for Windows.

Oddly, the Windows port sometimes doesn't synchronize command-line input and output streams as expected. Here, this seems to be due to a getpass bug or constraint -- because getpass uses the low-level msvcrt keyboard interface module we met in Chapter 2, it appears to not mix well with the stdin stream buffering used by raw_input, and botches the input stream in the process. The extra raw_input clears the input stream (sys.stdin.flush doesn't help).

In fact, without the superfluous raw_input for Windows, this script prompts for cleanall option input, but never stops to let you type a reply! This effectively disables cleanall altogether. To force distinct input and output lines and correct raw_input behavior, some scripts in this book run extra print statements or raw_input calls to sync up streams before further user interaction. There may be other fixes, and this may be improved in future releases; try this script without the extra raw_input to see if this has been repaired in your Python.

11.2.4 Uploading Web Sites

Uploading a full directory is symmetric to downloading: it's mostly a matter of swapping the local and remote machines and operations in the program we just met. The script in Example 11-13 uses FTP to copy all files in a directory on the local machine on which it runs, up to a directory on a remote machine.

I really use this script, too, most often to upload all of the files maintained on my laptop PC to my ISP account in one fell swoop. I also sometimes use it to copy my site from my PC to its starship mirror machine, or from the mirror machine back to my ISP. Because this script runs on any computer with Python and sockets, it happily transfers a directory from any machine on the Net to any machine running an FTP server. Simply change the initial setting in this module as appropriate for the transfer you have in mind.

Example 11-13. PP2E\Internet\Ftp\uploadflat.py
#!/bin/env python 
##########################################################################
# use ftp to upload all files from a local dir to a remote site/directory;
# e.g., run me to copy a web/ftp site's files from your PC to your ISP;
# assumes a flat directory upload: uploadall.py does nested directories.
# to go to my ISP, I change setting to 'home.rmi.net', and 'public_html'.
##########################################################################

import os, sys, ftplib, getpass

remotesite = 'starship.python.net'                  # upload to starship site
remotedir  = 'public_html/home'                     # from win laptop or other
remoteuser = 'lutz'
remotepass = getpass.getpass('Please enter password for %s: ' % remotesite)
localdir   = (len(sys.argv) > 1 and sys.argv[1]) or '.'
if sys.platform[:3] == 'win': raw_input()           # clear stream
cleanall   = raw_input('Clean remote directory first? ')[:1] in ['y', 'Y']

print 'connecting...'
connection = ftplib.FTP(remotesite)                 # connect to ftp site
connection.login(remoteuser, remotepass)            # login as user/password
connection.cwd(remotedir)                           # cd to directory to copy

if cleanall:
    for remotename in connection.nlst():            # try to delete all remotes
        try:                                        # first to remove old files
            print 'deleting remote', remotename
            connection.delete(remotename)           
        except:                                     
            print 'cannot delete remote', remotename

count = 0
localfiles = os.listdir(localdir)                   # upload all local files
                                                    # listdir() strips dir path
for localname in localfiles:  
    localpath = os.path.join(localdir, localname) 
    print 'uploading', localpath, 'to', localname
    if localname[-4:] == 'html' or localname[-3:] == 'txt':
        # use ascii mode xfer
        localfile = open(localpath, 'r')
        connection.storlines('STOR ' + localname, localfile)
    else:
        # use binary mode xfer
        localfile = open(localpath, 'rb')
        connection.storbinary('STOR ' + localname, localfile, 1024)
    localfile.close()
    count = count+1

connection.quit()
print 'Done:', count, 'files uploaded.'

Like the mirror download script, the program here illustrates a handful of new FTP interfaces and a set of FTP scripting techniques:

Deleting all remote files

Just like the mirror script, the upload begins by asking if we want to delete all the files in the remote target directory before copying any files there. This cleanall option is useful if we've deleted files in the local copy of the directory in the client -- the deleted files would remain on the server-side copy unless we delete all files there first. To implement the remote cleanup, this script simply gets a listing of all the files in the remote directory with the FTP nlst method, and deletes each in turn with the FTP delete method. Assuming we have delete permission, the directory will be emptied (file permissions depend on the account we logged into when connecting to the server). We've already moved to the target remote directory when deletions occur, so no directory paths must be prepended to filenames here.

Storing all local files

To apply the upload operation to each file in the local directory, we get a list of local filenames with the standard os.listdir call, and take care to prepend the local source directory path to each filename with the os.path.join call. Recall that os.listdir returns filenames without directory paths, and the source directory may not be the same as the script's execution directory if passed on the command line.

Text-based uploads

This script may be run on both Windows and Unix-like clients, so we need to handle text files specially. Like the mirror download, this script picks text or binary transfer modes by inspecting each filename's extension -- HTML and text files are moved in FTP text mode. We've already met the storbinary FTP object method used to upload files in binary mode -- an exact, byte-for-byte copy appears at the remote site.

Text mode transfers work almost identically: the storlines method accepts an FTP command string and a local file (or file-like) object opened in text mode, and simply copies each line in the local file to a same-named file on the remote machine. As usual, if we run this script on Windows, opening the input file in "r" text mode means that DOS-style \r\n end-of-line sequences are mapped to the \n character as lines are read. When the script is run on Unix and Linux, lines end in a single \n already, so no such mapping occurs. The net effect is that data is read portably, with \n characters to represent end-of-line. For binary files, we open in "rb" mode to suppress such automatic mapping everywhere (we don't want bytes that happen to have the same value as \r to magically disappear when read on Windows).[3]

[3] Technically, Python's storlines method automatically sends all lines to the server with \r\n line-feed sequences, no matter what it receives from the local file's readline method (\n or \r\n). Because of that, the most important distinctions for uploads are to use the "rb" for binary mode and the storlines method for text. Consult module ftplib.py in the Python source library directory for more details.

As for the mirror download script, this program simply iterates over all files to be transferred (files in the local directory listing this time), and transfers each in turn -- in either text or binary mode, depending on the files' names. Here is the command I use to upload my entire web site from my laptop Windows 98 PC to the remote Unix server at my ISP, in a single step:

C:\Stuff\Website\public_html>python %X%\Internet\Ftp\uploadflat.py 
Please enter password for starship.python.net:
Clean remote directory first?
connecting...
uploading .\LJsuppcover.jpg to LJsuppcover.jpg
uploading .\PythonPowered.gif to PythonPowered.gif
uploading .\PythonPoweredAnim.gif to PythonPoweredAnim.gif
uploading .\PythonPoweredSmall.gif to PythonPoweredSmall.gif
uploading .\Pywin.gif to Pywin.gif
uploading .\UPDATES to UPDATES
uploading .\about-hopl.html to about-hopl.html
uploading .\about-lp.html to about-lp.html
uploading .\about-pp-japan.html to about-pp-japan.html
...
 ...lines deleted...
...
uploading .\trainingCD.GIF to trainingCD.GIF
uploading .\uk-1.jpg to uk-1.jpg
uploading .\uk-2.jpg to uk-2.jpg
uploading .\uk-3.jpg to uk-3.jpg
uploading .\uploadflat.py to uploadflat.py
uploading .\whatsnew.html to whatsnew.html
uploading .\whatsold.html to whatsold.html
uploading .\xlate-lp.html to xlate-lp.html
Done: 131 files uploaded.

Like the mirror example, I usually run this command from the local directory where my web files are kept, and I pass Python the full path to the script. When I run this on the starship Linux server, it works the same, but the paths to the script and my web files directory differ. If you elect to clean the remote directory before uploading, you'll get a bunch of "deleting remote..." messages before the "uploading..." lines here, too:

...
deleting remote uk-3.jpg
deleting remote whatsnew.html
deleting remote whatsold.html
deleting remote xlate-lp.html
deleting remote uploadflat.py
deleting remote ora-lp-france.gif
deleting remote LJsuppcover.jpg
deleting remote sonyz505js.gif
deleting remote pic14.html
...

11.2.5 Uploads with Subdirectories

Perhaps the biggest limitation of the web site download and upload scripts we just met are that they assume the site directory is flat (hence their names) -- i.e., both transfer simple files only, and neither handles nested subdirectories within the web directory to be transferred.

For my purposes, that's a reasonable constraint. I avoid nested subdirectories to keep things simple, and I store my home web site as a simple directory of files. For other sites (including one I keep at the starship machine), site transfer scripts are easier to use if they also automatically transfer subdirectories along the way.

It turns out that supporting directories is fairly simple -- we need to add only a bit of recursion and remote directory creation calls. The upload script in Example 11-14 extends the one we just saw, to handle uploading all subdirectories nested within the transferred directory. Furthermore, it recursively transfers subdirectories within subdirectories -- the entire directory tree contained within the top-level transfer directory is uploaded to the target directory at the remote server.

Example 11-14. PP2E\Internet\Ftp\uploadall.py
#!/bin/env python
##########################################################################
# use ftp to upload all files from a local dir to a remote site/directory;
# this version supports uploading nested subdirectories too, but not the 
# cleanall option (that requires parsing ftp listings to detect remote 
# dirs, etc.);  to upload subdirectories, uses os.path.isdir(path) to see 
# if a local file is really a directory, FTP().mkd(path) to make the dir
# on the remote machine (wrapped in a try in case it already exists there), 
# and recursion to upload all files/dirs inside the nested subdirectory. 
# see also: uploadall-2.py, which doesn't assume the topremotedir exists.
##########################################################################
  
import os, sys, ftplib
from getpass import getpass

remotesite   = 'home.rmi.net'          # upload from pc or starship to rmi.net
topremotedir = 'public_html' 
remoteuser   = 'lutz'
remotepass   = getpass('Please enter password for %s: ' % remotesite)
toplocaldir  = (len(sys.argv) > 1 and sys.argv[1]) or '.'

print 'connecting...'
connection = ftplib.FTP(remotesite)               # connect to ftp site
connection.login(remoteuser, remotepass)          # login as user/password
connection.cwd(topremotedir)                      # cd to directory to copy to
                                                  # assumes topremotedir exists
def uploadDir(localdir):
    global fcount, dcount
    localfiles = os.listdir(localdir)
    for localname in localfiles:  
        localpath = os.path.join(localdir, localname) 
        print 'uploading', localpath, 'to', localname
        if os.path.isdir(localpath):
            # recur into subdirs
            try:
                connection.mkd(localname)
                print localname, 'directory created'
            except: 
                print localname, 'directory not created'
            connection.cwd(localname)
            uploadDir(localpath)
            connection.cwd('..')
            dcount = dcount+1
        else:
            if localname[-4:] == 'html' or localname[-3:] == 'txt':
                # use ascii mode xfer
                localfile = open(localpath, 'r')
                connection.storlines('STOR ' + localname, localfile)
            else:
                # use binary mode xfer
                localfile = open(localpath, 'rb')
                connection.storbinary('STOR ' + localname, localfile, 1024)
            localfile.close()
            fcount = fcount+1

fcount = dcount = 0
uploadDir(toplocaldir)
connection.quit()
print 'Done:', fcount, 'files and', dcount, 'directories uploaded.'

Like the flat upload script, this one can be run on any machine with Python and sockets and upload to any machine running an FTP server; I run it both on my laptop PC and on starship by Telnet to upload sites to my ISP.

In the interest of space, I'll leave studying this variant in more depth as a suggested exercise. Two quick pointers, though:

  • The crux of the matter here is the os.path.isdir test near the top; if this test detects a directory in the current local directory, we create a same-named directory on the remote machine with connection.mkd and descend into it with connection.cwd, and recur into the subdirectory on the local machine. Like all FTP object methods, mkd and cwd methods issue FTP commands to the remote server. When we exit a local subdirectory, we run a remote cwd('..') to climb to the remote parent directory and continue. The rest of the script is roughly the same as the original.

  • Note that this script handles only directory tree uploads; recursive uploads are generally more useful than recursive downloads, if you maintain your web sites on your local PC and upload to a server periodically, as I do. If you also want to download (mirror) a web site that has subdirectories, see the mirror scripts in the Python source distribution's Tools directory (currently, at file location Tools/scripts/ftpmirror.py). It's not much extra work, but requires parsing the output of a remote listing command to detect remote directories, and that is just complicated enough for me to omit here. For the same reason, the recursive upload script shown here doesn't support the remote directory cleanup option of the original -- such a feature would require parsing remote listings as well.

For more context, also see the uploadall-2.py version of this script in the examples distribution; it's similar, but coded so as not to assume that the top-level remote directory already exists.

    I l@ve RuBoard Previous Section Next Section