|< Day Day Up >|
9.3 Datafile Format Specifiers
LOAD DATA INFILE and SELECT … INTO OUTFILE both assume by default a datafile format in which column values are separated by tab characters and records are terminated by newlines. If a datafile to be read by LOAD DATA INFILE has different column separators or line terminators, you must indicate what the format is so that MySQL can read the file contents correctly. Similarly, if you want SELECT … INTO OUTFILE to write a file with different separators or terminators, you'll need to indicate the format to use. It's also possible to control quoting and escaping behavior.
LOAD DATA INFILE and SELECT … INTO OUTFILE don't allow you to specify anything about the characteristics of specific individual columns in the datafile, such as that column 3 is numeric or that column 17 contains dates. Instead, you define the general characteristics that apply to all column values: What characters separate column values in data rows, whether values are quoted, and whether there is an escape character that signifies special character sequences.
For LOAD DATA INFILE, any format specifiers given are listed after the table name. For SELECT … INTO OUTFILE, they follow the output filename. The syntax for format specifiers is the same for both statements and looks like this:
FIELDS TERMINATED BY 'string' ENCLOSED BY 'char' ESCAPED BY 'char' LINES TERMINATED BY 'string'
The FIELDS clause defines the formatting of data values within a line and the LINES clause defines the line-ending sequence. In other words, FIELDS indicates the structure of column values within records and LINES indicates where record boundaries occur.
The TERMINATED BY, ENCLOSED BY, and ESCAPED BY parts of the FIELDS clause may be given in any order. You need not specify all three parts. Defaults are used for any that are missing (or if the FIELDS clause itself is missing):
The default line terminator is the newline (linefeed) character. To indicate a line-ending sequence explicitly, use a LINES clause. Common line-terminator specifiers are newline, carriage return, and carriage return/newline pairs. These are specified as follows:
LINES TERMINATED BY '\n' LINES TERMINATED BY '\r' LINES TERMINATED BY '\r\n'
Because newline is the default line terminator, it need be specified only if you want to make the line-ending sequence explicit. Newline terminators are common on Unix systems, carriage returns are common on Mac OS and Mac OS X, and carriage return/newline pairs are common on Windows.
The ESCAPED BY option controls only the handling of values in the datafile, not how you specify the statement itself. If you want to specify a datafile escape character of @, you'd write ESCAPED BY '@'. That doesn't mean you then use @ to escape special characters elsewhere in the statement. For example, you'd still specify carriage return as the line termination character using LINES TERMINATED BY '\r', not using LINES TERMINATED BY '@r'.
Suppose that a file named data.txt contains information in comma-separated values (CSV) format, with values quoted by double quote characters and lines terminated by carriage returns. To load the file into a table t, use this LOAD DATA INFILE statement:
LOAD DATA INFILE 'data.txt' INTO TABLE t FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\r';
SELECT * INTO OUTFILE 'data.txt' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\r' FROM t;
|< Day Day Up >|