Previous Section  < Day Day Up >  Next Section

4.10 Column Types

MySQL can work with many different kinds of data. Generally speaking, data values can be grouped into three categories:

  • Numeric values. Numbers may or may not have a fractional part and may have a leading sign. For example, 14, -428.948, and +739 all are legal numbers. Integer values have no fractional part; columns for values with a fractional part can be declared to have either a fixed or variable number of decimal places. Numeric columns can be declared to be unsigned to prevent negative values from being accepted in the column.

  • String values. Strings may be case sensitive or case insensitive. Strings may store characters or raw data values that contain arbitrary byte values. Strings are written within quotes (for example, 'I am a string' or "I am a string"). String columns can be declared as either fixed length or variable length.

  • Temporal values. Temporal values include dates (such as '2005-11-03'), times (such as '14:23:00'), and values that have both a date and a time part ('2005-11-03 14:23:00'). MySQL also supports a special temporal type that represents year-only values efficiently. Date and time values can be written as quoted strings and may sometimes be written as numbers in contexts where numeric temporal values are understood.

When you create a table, the declaration for each of its columns includes the column name, a datatype specification that indicates what kind of values the column may hold, and possibly some options that more specifically define how MySQL should handle the column. For example, the following statement creates a table named people, which contains a numeric column named id and two 30-byte string columns named first_name and last_name:






CREATE TABLE people

(

    id         INT,

    first_name CHAR(30),

    last_name  CHAR(30)

);


The column definitions in this CREATE TABLE statement contain only names and column datatype specifications. To control the use of a column more specifically, options may be added to its definition. For example, to disallow negative values in the id column, add the UNSIGNED option. To disallow NULL (missing or unknown) values in any of the columns, add NOT NULL to the definition of each one. The modified CREATE TABLE statement looks like this:






CREATE TABLE people

(

    id         INT UNSIGNED NOT NULL,

    first_name CHAR(30) NOT NULL,

    last_name  CHAR(30) NOT NULL

);


For each of the general datatype categories (number, string, date, and time), MySQL has several specific column types from which to choose. It's important to properly understand the datatypes that are available for representing data, to avoid choosing a column type that isn't appropriate. The following sections provide a general description of the column datatypes and their properties. For additional details, the MySQL Reference Manual provides an extensive discussion on column datatypes.

4.10.1 Numeric Column Types

MySQL provides numeric column types for integer values, values with a fixed number of decimal places, and floating-point values that have a variable number of decimal places. When you choose a numeric column datatype, consider the following factors:

  • The range of values the datatype represents

  • The amount of storage space that column values require

  • The display width indicating the maximum number of characters to use when presenting column values in query output

  • The column precision (number of digits before the decimal) for values with a scale

4.10.1.1 Integer Column Types

Integer datatypes include TINYINT, SMALLINT, MEDIUMINT, INT, and BIGINT. Smaller datatypes require less storage space, but are more limited in the range of values they represent. For example, a TINYINT column has a small range (–128 to 127), but its values take only one byte each to store. INT has a much larger range (–2,147,483,648 to 2,147,483,647) but its values require four bytes each. The integer datatypes are summarized in the following table, which indicates the amount of storage each type requires as well as its range. For integer values declared with the UNSIGNED option, negative values are not allowed, and the high end of the range shifts upward to approximately double the maximum value.

Type

Storage Required

Signed Range

Unsigned Range

TINYINT

1 byte

-128 to 127

0 to 255

SMALLINT

2 bytes

-32,768 to 32,767

0 to 65,535

MEDIUMINT

3 bytes

-8,388,608 to 8,388,607

0 to 16,777,215

INT

4 bytes 2,147,483,647

-2,147,683,648 to

0 to 4,294,967,295

BIGINT

8 bytes

-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807

0 to 18,446,744,073,709,551,615


Integer datatypes may be declared with a display width, which affects the number of characters used to display column values in query output. For example, assume that you declare an INT column with a display width of 4 like this:






century INT(4)


The result is that values in the century column will usually be displayed four digits wide.

It's important to remember that the display width is unrelated to the range of the datatype. The display width you define for a column affects only the maximum number of digits MySQL will use to display column values. Values shorter than the display width are padded with spaces as necessary. Note also that the display width is not a hard limit; it won't cause output truncation of a value that's too long to fit within the width. Instead, the full value will be shown. For example, assume that you've inserted the number 57622 into the century column. When you SELECT the column in a query, MySQL will display the entire value (57622) rather than just the first four digits of the value.

If you don't specify a display width for an integer type, MySQL chooses a default based on the number of characters needed to display the full range of values for the type (including the minus sign). For example, SMALLINT has a default display width of 6 because the widest possible value is -32768.

4.10.1.2 Floating-Point and Fixed-Decimal Column Types

The floating-point datatypes include FLOAT and DOUBLE. The fixed-point datatype is DECIMAL. Each of these types may be used to represent numeric values that have a scale, or fractional part. FLOAT and DOUBLE datatypes represent floating-point values in the native binary format used by the server host's CPU. This is a very efficient type for storage and computation, but values are subject to rounding error. DECIMAL uses a fixed-decimal storage format: All values in a DECIMAL column have the same number of decimal places. Values are stored in string format using one byte per digit. Numbers represented as strings cannot be processed as quickly as numbers represented in binary, so operations on DECIMAL columns are slower than operations on FLOAT and DOUBLE columns. DECIMAL values are not subject to rounding error when stored, which makes the DECIMAL column type a popular choice for financial applications involving currency calculations. However, be aware that currently MySQL does internal calculations using floating-point arithmetic, which can produce rounding error in the result.

FLOAT and DOUBLE are used to represent single-precision and double-precision floating-point values. They use 4 and 8 bytes each for storage, respectively. By default, MySQL represents values stored in FLOAT and DOUBLE columns to the maximum precision allowed by the hardware, but you can specify a display width and precision in the column definition. The following single-precision column definition specifies a display width of 10 digits, with a precision of 4 decimals:






avg_score FLOAT(10,4)


DECIMAL columns may also be declared with a display width and scale. If you omit them, the defaults are 10 and 0, so the following declarations are equivalent:






total DECIMAL

total DECIMAL(10)

total DECIMAL(10,0)


If you want to represent values such as dollar-and-cents currency figures, you can do so using a two-digit scale:






total DECIMAL(10,2)


The amount of storage required for DECIMAL column values depends on the type. Normally, the number of bytes of storage required per value is equal to the display width plus 2. For example, DECIMAL(6,3) requires 8 bytes: the display width is 6 and 2 bytes are needed to store the sign and decimal point. If the scale is 0, no decimal point needs to be stored, so one fewer byte is required. If the column is UNSIGNED, no sign character needs to be stored, also requiring one fewer byte.

4.10.2 String Column Types

The string column types are listed in the following table:

Type

Description

CHAR

Fixed-length string

VARCHAR

Variable-length string

BLOB

Variable-length binary string

TEXT

Variable-length nonbinary string

ENUM

Enumeration consisting of a fixed set of legal values

SET

Set consisting of a fixed set of legal values


When you choose a string datatype, consider the following factors:

  • The maximum length of values you need to store.

  • Whether to use a fixed or variable amount of storage.

  • Whether you need to store binary or nonbinary strings.

  • The number of distinct values required; ENUM or SET may be useful if the set of values is fixed.

The following discussion first describes the general differences between binary and nonbinary strings, and then the specific characteristics of each of the string column datatypes.

4.10.2.1 Binary and Nonbinary String Characteristics

Strings in MySQL may be treated as binary or nonbinary. The two types are each most suited to different purposes.

Binary strings have the following characteristics:

  • A binary string is treated as a string of byte values.

  • Comparisons of binary strings are performed on the basis of those byte values. This has the following implications:

    • Uppercase and lowercase versions of a given character have different byte values, so binary string comparisons are case sensitive.

    • Versions of a character that have different accent marks have different byte values, so binary string comparisons are also accent sensitive.

  • A multi-byte character, if stored as a binary string, is treated simply as multiple individual bytes. Character boundaries of the original data no longer apply.

Nonbinary strings are associated with a character set. The character set affects interpretation of string contents and sorting as follows:

  • A nonbinary string is a string of characters, all of which must belong to a specific character set. Characters may consist of a single byte, or multiple bytes if the character set allows it. For example, each character in the MySQL default character set (Latin-1, also known as ISO-8859-1) requires one byte to store. In contrast, the Japanese SJIS character set contains so many characters that they cannot all be represented in a single byte, so each character requires multiple bytes to store.

  • Nonbinary comparisons are based on the collating (sorting) order of the character set associated with the string.

  • Collating orders, or collations, sometimes treat uppercase and lowercase versions of a given character as equivalent. This means that comparisons using such collations are not case sensitive, so that, for example, 'ABC', 'Abc', and 'abc' are all considered equal.

  • Collations may also treat a given character with different accent marks as equivalent. The result is that comparisons of nonbinary strings may not be accent sensitive. For example, an a with no accent may be considered the same as the á and à characters.

  • Multi-byte character comparisons are performed in character units, not in byte units.

The preceding remarks regarding case and accent sensitivity are not absolute, just typical. A given character set can be defined with a collating order that's case or accent sensitive, or both. MySQL takes care to create character sets that correspond to the sorting order rules of different languages.

String comparison rules are addressed in more detail in section 6.1.1, "Case Sensitivity in String Comparisons."

The different treatment of binary and nonbinary strings in MySQL is important when it comes to choosing datatypes for table columns. If you want column values to be treated as case and accent insensitive, you should choose a nonbinary column type. Conversely, if you want case and accent sensitive values, choose a binary type. You should also choose a binary type for storing raw data values that consist of untyped bytes.

The CHAR and VARCHAR string column types are nonbinary by default, but can be made binary by including the keyword BINARY in the column definition. Other string types are inherently binary or nonbinary. BLOB columns are always binary, whereas TEXT columns are always nonbinary.

You can mix binary and nonbinary string columns within a single table. For example, assume that you want to create a table named auth_info, to store login name and password authorization information for an application. You want login names to match in any lettercase but passwords to be case sensitive. This statement would accomplish the task:






CREATE TABLE auth_info

(

    login    CHAR(16),          # not case sensitive

    password CHAR(16) BINARY    # case sensitive

);


4.10.2.2 The CHAR and VARCHAR Column Types

The CHAR and VARCHAR column types hold strings up to the maximum length specified in the column definition. To define a column with either of these datatypes, provide the column name, the keyword CHAR or VARCHAR, the maximum length of acceptable values in parentheses, and possibly the keyword BINARY. The maximum length should be a number from 0 to 255. (One of the sample exercises at the end of this chapter discusses why you might declare a zero-length column.) By default, CHAR and VARCHAR columns contain nonbinary strings. The BINARY modifier causes the values they contain to be treated as binary strings.

The CHAR datatype is a fixed-length type. Values in a CHAR column always take the same amount of storage. A column defined as CHAR(30), for example, requires 30 bytes for each value, even empty values. In contrast, VARCHAR is a variable-length datatype. A VARCHAR column takes only the number of bytes required to store each value, plus one byte per value to record the value's length.

For MySQL 4.0, the length for CHAR and VARCHAR columns is measured in bytes, not characters. There's no difference for single-byte character sets, but the two measures are different for multi-byte character sets. In MySQL 4.1, this will change; column lengths will be measured in characters. For example, CHAR(30) will mean 30 characters, even for multi-byte character sets.

4.10.2.3 The BLOB and TEXT Column Types

The BLOB and TEXT datatypes each come in four different sizes, differing in the maximum length of values they can store. All are variable-length types, so an individual value requires storage equal to the length (in bytes) of the value, plus 1 to 4 bytes to record the length of the value. The following table summarizes these datatypes; L represents the length of a given value.

Type

Storage Required

Maximum Length

TINYBLOB, TINYTEXT

L + 1 byte

255 bytes

BLOB, TEXT

L + 2 bytes

65,535 bytes

MEDIUMBLOB, MEDIUMTEXT

L + 3 bytes

16,777,215 bytes

LONGBLOB, LONGTEXT

L + 4 bytes

4,294,967,295 bytes


BLOB column values are always binary and TEXT column values are always nonbinary. When deciding which of the two to choose for a column, you would normally base your decision on whether you want to treat column values as case sensitive or whether they contain raw bytes rather than characters. BLOB columns are more suitable for case-sensitive strings or for raw data such as images or compressed data. TEXT columns are more suitable for case-insensitive character strings such as textual descriptions.

4.10.2.4 The ENUM and SET Column Types

Two of the string column types, ENUM and SET, are used when the values to be stored in a column are chosen from a fixed set of values. You define columns for both types in terms of string values, but MySQL represents them internally as integers. This leads to very efficient storage, but can have some surprising results unless you keep this string/integer duality in mind.

ENUM is an enumeration type. An ENUM column definition includes a list of allowable values; each value in the list is called a "member" of the list. Every value stored in the column must equal one of the values in the list. A simple (and very common) use for ENUM is to create a two-element list for columns that store yes/no or true/false choices. The following table shows how to declare such columns:






CREATE TABLE booleans

(

    yesno     ENUM('Y','N'),

    truefalse ENUM('T','F')

);


Enumeration values aren't limited to being single letters or uppercase. The columns could also be defined like this:






CREATE TABLE booleans

(

    yesno     ENUM('yes','no'),

    truefalse ENUM('true','false')

);


An ENUM column definition may list up to 65,535 members. Enumerations with up to 255 members require one byte of storage per value. Enumerations with 256 to 65,535 members require two bytes per value. The following table contains an enumeration column continent that lists continent names as valid enumeration members:






CREATE TABLE Countries

(

    name char(30),

    continent ENUM ('Asia','Europe','North America','Africa',

                    'Oceania','Antarctica','South America')

);


The values in an ENUM column definition are given as a comma-separated list of quoted strings. Internally, MySQL stores the strings as integers, using the values 1 through n for a column with n enumeration members. The following statement assigns the enumeration value 'Africa' to the continent column; MySQL actually stores the value 4 because 'Africa' is the fourth continent name listed in the enumeration definition:






INSERT INTO Countries (name,continent) VALUES('Kenya','Africa');


MySQL reserves the internal value 0 as an implicit member of all ENUM columns. It's used to represent illegal values assigned to an enumeration column. For example, if you assign 'USA' to the continent column, MySQL will store the value 0, rather than any of the values 1 through 7, because 'USA' is not a valid enumeration member. If you select the column later, MySQL displays 0 values as '' (the empty string).

The SET datatype, like ENUM, is declared using a comma-separated list of quoted strings that define its valid members. But unlike ENUM, a given SET column may be assigned a value consisting of any combination of those members. The following definition contains a list of symptoms exhibited by allergy sufferers:






CREATE TABLE allergy

(

    symptom SET('sneezing','runny nose','stuffy head','red eyes')

);


A patient may have any or all (or none) of these symptoms, and symptom values therefore might contain zero to four individual SET members, separated by commas. The following statements set the symptom column to the empty string (no SET members), a single SET member, and multiple SET members, respectively:






INSERT INTO allergy (symptom) VALUES('');

INSERT INTO allergy (symptom) VALUES('stuffy head');

INSERT INTO allergy (symptom) VALUES('sneezing,red eyes');


MySQL represents SET columns as a bitmap using one bit per member, so the elements in the symptom definition have internal values of 1, 2, 4, and 8 (that is, they have the values of bits 0 through 3 in a byte). Internally, MySQL stores the values shown in the preceding INSERT statements as 0 (no bits set), 4 (bit 2 set), and 9 (bits 0 and 3 set; that is, 1 plus 8).

A SET definition may contain up to 64 members. The internal storage required for set values varies depending on the number of SET elements (1, 2, 3, 4, or 8 bytes for sets of up to 8, 16, 24, 32, or 64 members).

If you try to store an invalid list member into a SET column, it's ignored because it does not correspond to any bit in the column definition. For example, setting a symptom value to 'coughing,sneezing,wheezing' results in an internal value of 1 ('sneezing'). The 'coughing' and 'wheezing' elements are ignored because they aren't listed in the column definition as legal set members.

As mentioned earlier in this section, the conversion between string and numeric representations of ENUM and SET values can result in surprises if you aren't careful. For example, although you would normally refer to an enumeration column using the string forms of its values, you can also use the internal numeric values. The effect of this can be very subtle if the string values look like numbers. Suppose that you define a table t like this:






CREATE TABLE t (age INT, siblings ENUM('0','1','2','3','>3'));


In this case, the enumeration values are the strings '0', '1', '2', '3', and '>3', and the matching internal numeric values are 1, 2, 3, 4, and 5, respectively. Now suppose that you issue the following statement:






INSERT INTO t (age,siblings) VALUES(14,'3');


The siblings value is specified here as the string '3', and that is the value assigned to the column in the new record. However, you can also specify the siblings value as a number, as follows:






INSERT INTO t (age,siblings) VALUES(14,3);


But in this case, 3 is interpreted as the internal value, which corresponds to the enumeration value '2'! The same principle applies to retrievals. Consider the following two statements:






SELECT * FROM t WHERE siblings = '3';

SELECT * FROM t WHERE siblings = 3;


In the first case, you get records that have an enumeration value of '3'. In the second case, you get records where the internal value is 3; that is, records with an enumeration value of '2'.

4.10.3 Date and Time Column Types

MySQL provides column types for storing different kinds of temporal information. In the following descriptions, the terms YYYY, MM, DD, hh, mm, and ss stand for a year, month, day of month, hour, minute, and second value, respectively.

The storage requirements and ranges for the date and time datatypes are summarized in the following table:

Type

Storage Required

Range

DATE

3 bytes

'1000-01-01' to '9999-12-31'

TIME

3 bytes

'-838:59:59' to '838:59:59'

DATETIME

8 bytes

'1000-01-01 00:00:00' to '9999-12-31 23:59:59'

TIMESTAMP

4 bytes

'1970-01-01 00:00:00' to mid-year 2037

YEAR

1 byte

1901 to 2155 (YEAR(4)), 1970 to 2069 (YEAR(2))


For TIMESTAMP, MySQL 4.0 displays values such as '1970-01-01 00:00:00' as 19700101000000. In MySQL 4.1, this changes so that TIMESTAMP display format is the same as DATETIME.

Each of these temporal datatypes also has a "zero" value that's used when you attempt to store an illegal value. The "zero" value is represented in a format appropriate for the type (such as '0000-00-00' for DATE and '00:00:00' for TIME).

4.10.3.1 The DATE and TIME Column Types

The DATE datatype represents date values in 'YYYY-MM-DD' format. This representation corresponds to the ANSI SQL date format, also known as ISO 8601 format.

The supported range of DATE values is '1000-01-01' to '9999-12-31'. You might be able to use earlier dates than that, but it's better to stay within the supported range to avoid unexpected behavior.

MySQL always represents DATE values in ISO 8601 format when it displays them. If necessary, you can reformat DATE values into other display formats using the DATE_FORMAT() function.

For date entry, MySQL also expects to receive dates in ISO format, or at least close to ISO format. That is, date values must be given in year-month-day order, but some deviation from strict ISO format is allowed:

  • Leading zeros on month and day values may be omitted. For example, '2000-1-1' and '2000-01-01' are both accepted as legal.

  • The delimiter between date parts need not be -; you can use other punctuation characters, such as /.

  • Two-digit years are converted to four-digit years. You should be aware that this conversion is done based on the rule that year values from 70 to 99 represent the years 1970 to 1999, whereas values from 00 to 69 represent the years 2000 to 2069. It's better to provide values with four-digit years to avoid problems with conversion of values for which the rule does not apply.

Instead of attempting to load values that aren't in an acceptable format into a DATE column, you should convert them into ISO format. An alternative approach that's useful in some circumstances is to load the values into a string column and perform reformatting operations using SQL string functions to produce ISO format values that can be assigned to a DATE column.

The TIME datatype represents time values in 'hh:mm:ss' format. TIME values may represent elapsed time, and thus might be outside the range of time-of-day values. They may even be negative values. (The actual range of TIME values is '-838:59:59' to '838:59:59'.)

MySQL represents TIME values in 'hh:mm:ss' format when displaying them. If necessary, you can reformat TIME values into other display formats using the TIME_FORMAT() function.

For TIME value entry, some variation on this format is allowed. For example, leading zeros on TIME parts may be omitted.

4.10.3.2 The TIMESTAMP and DATETIME Column Types

The DATETIME column type stores date-and-time values in 'YYYY-MM-DD hh:mm:ss' format. It's similar to a combination of DATE and TIME values, but the TIME part represents time of day rather than elapsed time and has a range limited to '00:00:00' to '23:59:59'. The date part of DATETIME columns has the same range as DATE columns; combined with the TIME part, this results in a DATETIME range from '1000-01-01 00:00:00' to '9999-12-31 23:59:59'.

The TIMESTAMP type, like DATETIME, stores date-and-time values, but has a different range and some special properties that make it especially suitable for tracking data modification times. TIMESTAMP also has a different display format from DATETIME prior to MySQL 4.1:

  • Until MySQL 4.1, TIMESTAMP values are represented as numbers in YYYYMMDDhhmmss format. The default display width is 14 digits, but you can specify an explicit width of any even number from 2 to 14. The display width affects only how MySQL displays TIMESTAMP values, not how it stores them. Stored values always include the full 14 digits.

  • From MySQL 4.1 on, the TIMESTAMP format is 'YYYY-MM-DD hh:mm:ss', just like DATETIME. Display widths are not supported.

The range of TIMESTAMP values begins at 1970-01-01 00:00:00 (GMT) and extends partway into the year 2037. TIMESTAMP values actually represent the number of seconds elapsed since the beginning of 1970 and are stored using four bytes. This provides room for sufficient seconds to represent a date in the year 2037. Note that TIMESTAMP values are stored using the server's local timezone.

TIMESTAMP columns have the following special properties:

  • Storing NULL into a TIMESTAMP column sets it to the current date and time. Updating a TIMESTAMP column to NULL also sets it to the current date and time.

  • If you omit a TIMESTAMP column from an INSERT statement, MySQL inserts the current date and time if the column is the first TIMESTAMP column in the table, and inserts zero if it is not.

  • MySQL automatically updates the first TIMESTAMP column in a table to the current date and time when you update (change the existing data in) any other column in the table. (Setting a column to its current value doesn't count as updating it.) Only the first TIMESTAMP column is subject to automatic updating. All other TIMESTAMP columns do not change unless you update them explicitly.

It's important to know about the automatic-update property. It's what makes TIMESTAMP columns useful for tracking record modification times, but is a source of confusion if you're not aware of it. People who choose TIMESTAMP for a column on the basis of the fact that it stores date-and-time values become dismayed and mystified when they discover that the column's values change unexpectedly.

4.10.3.3 The YEAR Column Type

The YEAR column type represents year-only values. You can declare such columns as YEAR(4) or YEAR(2) to obtain a four-digit or two-digit display format. If you don't specify any display width, the default is four digits.

If you don't need a full date and the range of values you need to store falls into the YEAR range, consider using YEAR to store temporal values. It's a very space-efficient datatype because values require only one byte of storage each.

4.10.4 Column Options

The final part of a column definition (following the datatype) can include optional modifiers. These options are described in the following list. Note that many of them apply only to certain column types.

  • UNSIGNED applies to numeric datatypes and causes negative values to be disallowed. If you attempt to store a negative value in an UNSIGNED column, MySQL stores zero instead.

  • ZEROFILL applies to integer numeric column types. It causes retrieved values to be left-padded with leading zeros up to the column's display width. For example, if you store the values 0, 14, and 1234 in a column that's defined as INT(5) ZEROFILL, MySQL displays them as 00000, 00014, and 01234 when you retrieve them.

    Using the ZEROFILL option for a column causes it to be UNSIGNED as well.

  • AUTO_INCREMENT applies to integer numeric column types. It's used to generate sequences of successive unique values. Defining a column with AUTO_INCREMENT causes a special behavior: When you insert NULL into the column, MySQL generates the next value in the sequence automatically and stores that in the column instead. Use of AUTO_INCREMENT carries with it other requirements: There may be only one AUTO_INCREMENT column per table, the column must be indexed, and the column must be defined as NOT NULL. Section 4.10.5, "Using the AUTO_INCREMENT Column Option," provides specific details on the use of AUTO_INCREMENT columns.

  • BINARY applies to the CHAR and VARCHAR datatypes. CHAR and VARCHAR columns are nonbinary by default; adding BINARY to the definition causes column values to be treated as binary strings.

    Beginning with MySQL 4.1, BINARY may also be applied to ENUM and SET columns to cause case-sensitive treatment of column values.

  • NULL and NOT NULL apply to all column types. They indicate whether or not a column can contain NULL values. If you specify neither option, the default is NULL, which allows NULL values in the column.

  • DEFAULT value provides the column with a default value to be used when you create a new record but don't explicitly specify a value for the column (for example, when you execute an INSERT statement that doesn't provide values for all columns in the table). This attribute applies to all column types except BLOB and TEXT. A default value must be a constant; it cannot be an expression whose value is calculated at record-creation time.

    If you don't specify a DEFAULT value for a column, MySQL chooses a default for you. The value is NULL if the column may contain NULL; otherwise, the value depends on the column type:

    • For numeric columns, the default is zero.

    • For string columns other than ENUM, the default is the empty string. For ENUM columns, the default is the first enumeration member.

    • For temporal columns, the default value is the "zero" value for the datatype, represented in whatever format is appropriate to the column type (for example, '0000-00-00' for DATE and '00:00:00' for TIME).

    The exceptions to the preceding are the first TIMESTAMP column in a table and integer columns that have the AUTO_INCREMENT attribute. For such columns, MySQL uses a default value of the current date and time, and the next sequence number, respectively. Furthermore, if you supply a DEFAULT option for these column types, MySQL ignores it or produces an error.

    It's an error to specify a default value of NULL for a NOT NULL column.

  • PRIMARY KEY and UNIQUE may be given at the end of a column definition, for all datatypes except BLOB and TEXT. They cause the creation of a PRIMARY KEY or UNIQUE index for the column. Adding either of these options to a column definition is the same as defining the index in a separate clause. For example, the following table definitions are equivalent:

    
    
    
    

    
    CREATE TABLE t (i INT NOT NULL PRIMARY KEY);
    
    
    
    
    
    CREATE TABLE t (i INT NOT NULL, PRIMARY KEY (i));
    
    

4.10.5 Using the AUTO_INCREMENT Column Option

The AUTO_INCREMENT option may be added to an integer column definition to create a column for which MySQL automatically generates a new sequence number each time you create a new row. The option is used in conjunction with an index (usually a primary key) and provides a mechanism whereby each value is a unique identifier that can be used to refer unambiguously to the row in which it occurs. MySQL also provides a LAST_INSERT_ID() function that returns the most recently generated AUTO_INCREMENT value. This function is useful for determining the identifier when you need to look up the record just created, or when you need to know the identifier to create related records in other tables.

The following scenario illustrates how you can set up and use an AUTO_INCREMENT column. Assume that you're organizing a conference and need to keep track of attendees and the seminars for which each attendee registers. (When someone submits a registration form for the conference, the form must indicate which of the available seminars the person wants to attend.)

Your task is to record seminar registrations and associate them with the appropriate attendee. Unique ID numbers provide a way to keep track of attendees and an AUTO_INCREMENT column makes the implementation for the task relatively easy:

  1. Set up an attendee table to record information about each person attending the conference. The table shown here includes columns for ID number, name, and job title:

    
    
    
    

    
    mysql> CREATE TABLE attendee
    
        -> (
    
        ->     att_id      INT UNSIGNED NOT NULL AUTO_INCREMENT,
    
        ->     att_name    CHAR(100),
    
        ->     att_title   CHAR(40),
    
        ->     PRIMARY KEY (att_id)
    
        -> );
    
    

    The att_id column is created as a PRIMARY KEY because it must contain unique values, and as an AUTO_INCREMENT column because it's necessary for MySQL to generate values for the column automatically.

  2. Set up a seminar table to record the seminars for which each attendee registers. Assume that there are four seminars: Database Design, Query Optimization, SQL Standards, and Using Replication. There are various ways in which these seminars can be represented; an ENUM column is one that works well because the seminar titles form a small fixed list of values. The table must also record the ID of each attendee taking part in the seminar. The table can be created with this statement:

    
    
    
    

    
    mysql> CREATE TABLE seminar
    
        -> (
    
        ->     att_id     INT UNSIGNED NOT NULL,
    
        ->     sem_title  ENUM('Database Design','Query Optimization',
    
        ->                     'SQL Standards','Using Replication'),
    
        ->     INDEX (att_id)
    
        -> );
    
    

    Note both the differences and similarities of the att_id column declarations in the two tables. In attendee, att_id is an AUTO_INCREMENT column and is indexed as a PRIMARY KEY to ensure that each value in the column is unique. In seminar, att_id is indexed for faster lookups, but it isn't indexed as a PRIMARY KEY. (There might be multiple records for a given attendee and a PRIMARY KEY does not allow duplicates.) Nor is the column declared in the seminar table with the AUTO_INCREMENT option because ID values should be tied to existing IDs in the attendee table, not generated automatically. Aside from these differences, the column is declared using the same datatype (INT) and options (UNSIGNED, NOT NULL) as the att_id column in the attendee table.

  3. Each time a conference registration form is received, enter the attendee information into the attendee table. For example:

    
    
    
    

    
    mysql> INSERT INTO attendee (att_name,att_title)
    
        -> VALUES('Charles Loviness','IT Manager');
    
    

    Note that the INSERT statement doesn't include a value for the att_id column. Because att_id is an AUTO_INCREMENT column, MySQL generates the next sequence number (beginning with 1) and sets the att_id column in the new row to that value. You can use the new att_id value to look up the record just inserted, but how do you know what value to use? The answer is that you don't need to know the exact value. Instead, you can get the ID by invoking the LAST_INSERT_ID() function, which returns the most recent AUTO_INCREMENT value generated during your current connection with the server. Thus, the record for Charles Loviness can be retrieved like this:

    
    
    
    

    
    mysql> SELECT * FROM attendee WHERE att_id = LAST_INSERT_ID();
    
    +--------+------------------+------------+
    
    | att_id | att_name         | att_title  |
    
    +--------+------------------+------------+
    
    |      3 | Charles Loviness | IT Manager |
    
    +--------+------------------+------------+
    
    

    This output indicates that the Loviness form was the third one entered.

  4. Next, enter new records into the seminar table for each seminar marked on the entry form. The att_id value in each of these records must match the att_id value in the newly created attendee record. Here again, the LAST_INSERT_ID() value can be used. If Loviness will participate in Database Design, SQL Standards, and Using Replication, create records for those seminars as follows:

    
    
    
    

    
    mysql> INSERT INTO seminar (att_id,sem_title)
    
        -> VALUES(LAST_INSERT_ID(),'Database Design');
    
    mysql> INSERT INTO seminar (att_id,sem_title)
    
        -> VALUES(LAST_INSERT_ID(),'SQL Standards');
    
    mysql> INSERT INTO seminar (att_id,sem_title)
    
        -> VALUES(LAST_INSERT_ID(),'Using Replication');
    
    

    To see what the new seminar records look like, use the LAST_INSERT_ID() value to retrieve them:

    
    
    
    

    
    mysql> SELECT * FROM seminar WHERE att_id = LAST_INSERT_ID();
    
    +--------+-------------------+
    
    | att_id | sem_title         |
    
    +--------+-------------------+
    
    |      3 | Database Design   |
    
    |      3 | SQL Standards     |
    
    |      3 | Using Replication |
    
    +--------+-------------------+
    
    

  5. When you receive the next registration form, repeat the process just described. For every new attendee record, the value of LAST_INSERT_ID() will change to reflect the new value in the att_id column.

The preceding description shows how to use an AUTO_INCREMENT column—how to declare the column, how to generate new ID values when inserting new records, and how to use the ID values to tie together related tables. However, the description glosses over some of the details. These are presented in the following discussion, beginning with declaration syntax and then providing further information about how AUTO_INCREMENT columns work.

The att_id-related declarations in the attendee table look like this:






att_id INT UNSIGNED NOT NULL AUTO_INCREMENT,

PRIMARY KEY (att_id)


These declarations involve the following factors, which you should consider when creating an AUTO_INCREMENT column:

  • The column must be an integer type. Choose the specific datatype based on the number of values the column must be able to hold. For the largest range, use BIGINT. However, BIGINT requires 8 bytes per value. If you want to use less storage, INT requires only 4 bytes per value and provides a range that's adequate for many applications. You can use integer types smaller than INT as well, but it's a common error to choose one that's too small. For example, TINYINT has a range that allows very few unique numbers, so you'll almost certainly run into problems using it as an AUTO_INCREMENT column for identification purposes.

  • An AUTO_INCREMENT sequence contains only positive values. For this reason, it's best to declare the column to be UNSIGNED. Syntactically, it isn't strictly required that you declare the column this way, but doing so doubles the range of the sequence because an UNSIGNED integer column has a larger maximum value. Defining the column as UNSIGNED also serves as a reminder that you should never store negative values in an AUTO_INCREMENT column.

  • The most common way to use an AUTO_INCREMENT column is as a primary key, which ensures unique values and prevents duplicates. The column should thus be defined to contain unique values, either as a PRIMARY KEY or a UNIQUE index. (MySQL allows you to declare an AUTO_INCREMENT column with a nonunique index, but this is less common.)

  • An AUTO_INCREMENT column defined as a PRIMARY KEY must also be NOT NULL.

After setting up an AUTO_INCREMENT column, use it as follows:

  • Inserting NULL into an AUTO_INCREMENT column causes MySQL to generate the next sequence value and store it in the column. Omitting the AUTO_INCREMENT column from an INSERT statement is the same as inserting NULL explicitly. In other words, an INSERT statement that does not provide an explicit value for an AUTO_INCREMENT column also generates the next sequence value for the column. For example, if id is an AUTO_INCREMENT column in the table t, the following two statements are equivalent:

    
    
    
    

    
    INSERT INTO t (id,name) VALUES(NULL,'Hans');
    
    INSERT INTO t (name) VALUES('Hans');
    
    

  • Currently, inserting 0 into an AUTO_INCREMENT column has the same effect as inserting NULL: the next sequence value is generated. However, it isn't recommended that you rely on this behavior because it might change in the future.

  • A positive value can be inserted explicitly into an AUTO_INCREMENT column if the value isn't already present in the column. If this value is larger than the current sequence counter, subsequent automatically generated values begin with the value plus one:

    
    
    
    

    
    mysql> CREATE TABLE t (id INT AUTO_INCREMENT, PRIMARY KEY (id));
    
    mysql> INSERT INTO t (id) VALUES(NULL),(NULL),(17),(NULL),(NULL);
    
    mysql> SELECT id FROM t;
    
    +----+
    
    | id |
    
    +----+
    
    |  1 |
    
    |  2 |
    
    | 17 |
    
    | 18 |
    
    | 19 |
    
    +----+
    
    

  • After an AUTO_INCREMENT value has been generated, the LAST_INSERT_ID() function returns the generated value. LAST_INSERT_ID() will continue to return the same value, regardless of the number of times it's invoked, until another AUTO_INCREMENT value is generated.

  • The value returned by LAST_INSERT_ID() is specific to the client that generates the AUTO_INCREMENT value. That is, it's connection-specific, so the LAST_INSERT_ID() value is always correct for the current connection, even if other clients also generate AUTO_INCREMENT values of their own. Another client cannot change the value that LAST_INSERT_ID() returns to you, nor can one client use LAST_INSERT_ID() to determine the AUTO_INCREMENT value generated by another.

  • AUTO_INCREMENT behavior is the same for REPLACE as it is for INSERT. Any existing record is deleted, and then the new record is inserted. Consequently, replacing an AUTO_INCREMENT column with NULL or 0 causes it to be set to the next sequence value.

  • If you update an AUTO_INCREMENT column to NULL or 0 in an UPDATE statement, the column is set to 0.

  • If you delete rows containing values at the high end of a sequence, those values are not reused for MyISAM or InnoDB tables when you insert new records. For example, if an AUTO_INCREMENT column contains the values from 1 to 10 and you delete the record containing 10, the next sequence value is 11, not 10. (This differs from ISAM and BDB tables, for which values deleted from the high end of a sequence are reused.)

The MyISAM storage engine supports composite indexes that include an AUTO_INCREMENT column. This allows creation of independent sequences within a single table. Consider the following table definition:






CREATE TABLE multisequence

(

    name     CHAR(10) NOT NULL,

    name_id  INT UNSIGNED NOT NULL AUTO_INCREMENT,

    PRIMARY KEY (name, name_id)

);


Inserting name values into the multisequence table generates separate sequences for each distinct name:






mysql> INSERT INTO multisequence (name)

    -> VALUES('Petr'),('Ilya'),('Ilya'),('Yuri'),('Ilya'),('Petr');

mysql> SELECT * FROM multisequence ORDER BY name, name_id;

+------+---------+

| name | name_id |

+------+---------+

| Ilya |       1 |

| Ilya |       2 |

| Ilya |       3 |

| Petr |       1 |

| Petr |       2 |

| Yuri |       1 |

+------+---------+


Note that for this kind of AUTO_INCREMENT column, values deleted from the high end of any sequence are reused. This differs from MyISAM behavior for single-column AUTO_INCREMENT sequences.

4.10.6 Automatic Type Conversion and Value Clipping

For historical reasons, MySQL is forgiving about signaling an error if a given value doesn't match the datatype of the column that is the insert target. Instead, MySQL does its best to perform automatic type conversion. For example, if you attempt to store a negative value in an UNSIGNED integer column, MySQL silently converts it to zero, which is the nearest legal value for the column. In other words, the MySQL server converts input values to the types expected from the column definitions, inserts the result, and continues on its way.

If you need to prevent attempts to insert invalid values into a table, you should first validate the values on the client side; however, because that isn't an exam topic, it isn't discussed further here.

This section describes the kinds of conversions that MySQL performs and the circumstances under which they occur. After you know these principles, you'll know what types of validation are necessary before trying to store your data in a MySQL database.

In many cases, type conversion affords you the flexibility to write a statement different ways and get the same result. For example, if i is an integer column, the following statements both insert 43 into it, even though the value is specified as a number in one statement and as a string in the other:






INSERT INTO t (i) VALUES(43);

INSERT INTO t (i) VALUES('43');


MySQL performs automatic string-to-number conversion for the second statement.

In other cases, the effects of type conversion might be surprising, particularly if you're unaware that these conversions occur. You can avoid such surprises by understanding the conditions under which conversion takes place. In general, MySQL performs type conversion based on the constraints implied by a column's definition. These constraints apply in several contexts:

  • When you insert or update column values with statements such as INSERT, REPLACE, UPDATE, or LOAD DATA INFILE.

  • When you change a column definition with ALTER TABLE.

  • When you specify a default value using a DEFAULT value option in a column definition. (For example, if you specify a negative default for an UNSIGNEDcolumn, the value is converted, resulting in a default of zero.)

The following list discusses some of the conversions that MySQL performs. It isn't exhaustive, but is sufficiently representative to provide you with a good idea of how MySQL treats input values and what you'll be tested on in the exam. Circumstances under which automatic type conversion occurs include the following:

  • Conversion of out-of-range values to in-range values. If you attempt to store a value that's smaller than the minimum value allowed by the range of a column's datatype, MySQL stores the minimum value in the range. If you attempt to store a value that's larger than the maximum value in the range, MySQL stores the range's maximum value. Some examples of this behavior are as follows:

    • TINYINT has a range of –128 to 127. If you attempt to store values less than –128 in a TINYINT column, MySQL stores –128 instead. Similarly, MySQL stores values greater than 127 as 127.

    • If you insert a negative value into an UNSIGNED integer column, MySQL converts the value to 0.

    • When you reach the upper limit of an AUTO_INCREMENT column, an attempt to generate the next sequence value results in a duplicate-key error. This is a manifestation of MySQL's general out-of-range value clipping behavior. For example, assume that you have a TINYINT UNSIGNED column as an AUTO_INCREMENT column and that it currently contains 254 as the maximum sequence value. The upper limit for this column type is 255, so the next insert generates a sequence value of 255 and successfully stores it in the new record. However, the insert after that fails because MySQL generates the next sequence value, which is 256. Because 256 is higher than the column's upper limit of 255, MySQL clips 256 down to 255 and attempts to insert that value. But because 255 is already present in the table, a duplicate-key error occurs.

  • Conversion to datatype default. If you attempt to store a value for which MySQL cannot decide on an appropriate conversion, it stores the default value for the datatype of the target column. For example, if you try to store the value 'Sakila' in an INT column, MySQL stores the value 0. For dates, the "zero" value is 0000-00-00 and for time columns 00:00:00. More details on the default for each column type are given in section 4.10.4, "Column Options."

  • String truncation. If you attempt to store a string value into a VARCHAR or CHAR column with a defined length that's shorter than the string, the string is truncated to fit the column's length. That is, only the leading characters that fit into the column are stored. The remaining characters are discarded. For example, if you try to store the value 'Sakila' into a column defined as CHAR(4), MySQL stores the value 'Saki'.

  • Date and time interpretation. The server performs streamlined checking of temporal values. It looks at individual components of date and time values, but does not perform an exhaustive check of the value as a whole. For example, day values may be considered valid as long as they're within the range 1 to 31. This means you can specify a date such as '2000-04-31' and MySQL will store it as given. However, a DATETIME value such as '2000-01-01 24:00:00' contains an hour value of 24, which is never valid as a time of day. Consequently, MySQL stores the "zero" value in DATETIME format ('0000-00-00 00:00:00').

  • Addition of century for two-digit years. Like many other computer programs, MySQL converts two-digit years to four-digit years. Values 00 to 69 are converted to 2000-2069; values 70 to 99 are converted to 1970-1999.

  • Enumeration and set value conversion. If a value that's assigned to an ENUM column isn't listed in the ENUM definition, MySQL converts it to '' (the empty string). If a value that's assigned to a SET column contains elements that aren't listed in the SET definition, MySQL discards those elements, retaining only the legal elements.

  • Handing assignment of NULL to NOT NULL columns. The effect of assigning NULL to a NOT NULL column depends on whether the assignment occurs in a single-row or multiple-row INSERT statement. For a single-row INSERT, the statement fails. For a multiple-row INSERT, the column is assigned the default value for the column type.

  • Conversion of fixed-point values. Conversion, into numbers, of string values that can be interpreted as numbers is different for DECIMAL than for other numeric datatypes. This occurs because DECIMAL values are represented as strings rather than in native binary format. For example, if you assign '0003' to an INT or FLOAT, it's stored as 3 in integer or floating-point binary format. In contrast, if you assign '0003' to a DECIMAL column, it's stored without change, including the leading zeros, even though it will behave identically to a '3' in mathematical operations. If the DECIMAL column isn't wide enough to accommodate the leading zeros, as many are stored as possible. If you store '0003' into a DECIMAL(2,0) UNSIGNED column, it's converted to '03'.

Using ALTER TABLE to change a column's datatype maps existing values to new values according to the constraints imposed by the new datatype. This might result in some values being changed. For example, if you change a TINYINT to an INT, no values are changed because all TINYINT values fit within the INT range. However, if you change an INT to a TINYINT, any values that lie outside the range of TINYINT are clipped to the nearest endpoint of the TINYINT range. Similar effects occur for other types of conversions, such as TINYINT to TINYINT UNSIGNED (negative values are converted to zero), and converting a long string column to a shorter one (values that are too long are truncated to fit the new size).

If a column is changed to NOT NULL using ALTER TABLE, MySQL converts NULL values to the default value for the column type.

The following table shows how several types of string values are handled when converted to date or numeric datatypes. It demonstrates several of the points just discussed. Note that only string values that look like dates or numbers convert properly without loss of information. Note too that leading zeros are retained for the DECIMAL column during conversion.

String Value

Converted to DATE

Converted to INT

Converted to DECIMAL

'2010-12-03'

'2010-12-03'

2010

2010

'zebra'

'0000-00-00'

0

0

'500 hats'

'0000-00-00'

500

500

'1978-06-12'

'1978-06-12'

1970

1970

'06-12-1978'

'0000-00-00'

6

06

'0017'

'0000-00-00'

17

0017


    Previous Section  < Day Day Up >  Next Section