Internationalization and Localization Issues
Internationalization refers to the capability of software to be used according to local convention, for any of a variety of locations. Localization refers to selecting a particular set of local conventions from among those sets that are supported. The following aspects of MySQL configuration relate to internationalization and localization:
Configuring Time Zone Support
Your server's time zone capabilities depend on what version of MySQL you are running. In all versions of MySQL, the server sets its default time zone by examining its environment. Most often, this will be the local time zone of the server host, but you can specify the time zone explicitly at server startup. In addition, as of MySQL 4.1.3, the server allows each client that connects to override the default setting and set its own time zone. This enables applications to use time settings that depend on where the client program is running rather than where the server is running. The following discussion describes the current capabilities for multiple time zones, and then briefly describes how to set the time zone for older single time zone servers.
From MySQL 4.1.3 on, two system variables hold time zone information:
SELECT @@GLOBAL.time_zone, @@SESSION.time_zone;
You can set the value of time_zone using three kinds of variables, although one of them requires additional administrative action. The statements shown here set the session value. Substitute GLOBAL to set the global value. (However, be aware that you cannot set the GLOBAL value at runtime if binary logging is enabled, because that would cause replication difficulties.)
To use the third method (setting the time zone by name), you must enable the server to understand time zone names by loading information from the operating system's time zone files into a set of tables in the mysql database. This does not happen automatically during MySQL installation. You must populate the tables manually by using the mysql_tzinfo_to_sql program. This program reads the time zone files and constructs SQL statements from their contents. You can feed these statements to the mysql program to execute them.
To set up the time zone tables on a system that has time zone files, determine where they are installed. If this location is /usr/share/zoneinfo, the command to load the files into the mysql database looks like this:
% mysql_tzinfo_to_sql /usr/share/zoneinfo | mysql -p -u root mysql
That should suffice for most versions of Unix. For Windows and for Unix systems that do not have a set of time zone files, you can obtain a package containing a set of pre-built MyISAM tables containing time zone information from this location:
Download the package and unpack it. With the MySQL server stopped, copy the .frm, .MYD, and .MYI files into the mysql database directory under your data directory. Then restart the server.
Before MySQL 4.1.3, the server always operates in a single time zone. This time zone applies to all client connections even if the client happens to be in a different time zone. The timezone system variable indicates the time zone in which the server operating. (This variable was renamed to system_time_zone in MySQL 4.1.3.) You can cause the server to set its timezone variable by setting the TZ environment variable or by using the --timezone option to mysqld_safe, as described earlier.
Selecting the Language for Error Messages
The MySQL server can produce diagnostic and error messages in any of several languages. The default is english, but you can specify others. To see which are available, look under the share/mysql directory of your MySQL installation. The directories that have language names correspond to the available languages. To change the message language, use the --language startup option. The argument can be either the language name or the pathname to the language directory. For example, to use French if your installation is located under /usr/local/mysql, you can use either --language=french or --language=/usr/local/mysql/share/mysql/french.
Configuring Character Set Support
Beginning with version 4.1, MySQL supports multiple character sets simultaneously. You can select character sets at the server, database, table, column, and string constant levels. Character sets determine which characters are allowed in string values. MySQL also supports multiple collating sequences per character set. Collations affect string comparison and sorting operations.
This section describes how to configure MySQL's character set support. For general background on the server's character set capabilities, see Chapter 2, "MySQL SQL Syntax and Use." For details on creating character columns and using them, see Chapter 3, "Working with Data in MySQL."
Note: Prior to MySQL 4.1, the server supports only a single character set at a time, although you can select which one to use at server startup. If you upgrade a pre-4.1 installation to 4.1 or later, you should convert tables that contain character columns to the current format. See "Upgrading Older Tables to Enable MySQL 4.1 Character Set Support" for details.
When you configure the server at build time, you can specify which character sets the server should support, as well as the default character set and collation. Use the following options to the configure script:
Here is a sample configuration command that uses all three options:
% ./configure --with-charset=utf8 \ --with-collation=utf8_icelandic_ci \ --with-extra-charsets=all
At runtime, the server sets its default character set and collation to the built-in defaults unless you specify otherwise. To select different values, use the --character-set-server and --collation-server startup options. The collation must be compatible with the character set.
When you run a client program, you can specify the character set that you want the program to use by giving the --default-character-set option. If the character set you want isn't available as part of your MySQL installation, but you do have the necessary character set files installed under another directory, you can tell the client program their location by specifying the --character-sets-dir option.
Upgrading Older Tables to Enable MySQL 4.1 Character Set Support
Before MySQL 4.1, the server always uses a single character set. The default is latin1 but that can be changed if you start the server with the --default-character-set option. If your tables all use the default character set, you can upgrade them to MySQL 4.1 format in straightforward fashion with no character set conversion necessary.
However, some people use 4.0 servers to store data in character sets other than the default. This requires that applications know the server's limitations and compensate for them. A common technique is to "fake it" by storing information that really isn't in the server's default character set and then interpreting it using the proper character set on the client side. For example, your server might use latin1 as the default character set, but you have applications that need to work with UTF-8 information. One workaround is to store the raw character information in a latin1 CHAR or TEXT column, and have the application interpret column values as UTF-8 on the client side. In MySQL 4.1 and up, you should convert your tables so that their columns definitions explicitly indicate that they have a character set of utf8. However, you must be careful when doing so to avoid loss of information due to inappropriate conversions.
The following sections describe table upgrading procedures to be used when upgrading to MySQL 4.1 or later. Use the first procedure if your tables all use the default character set, and the second if they don't. It's a good idea to make a backup first.
Upgrading Tables When Character Set Conversion Is Unnecessary
If your tables use the default character set and you just want to upgrade them to the current table format, use this procedure:
If you don't want to dump and reload all your tables, you can upgrade MySQL and then convert individual tables by using statements of this form:
ALTER TABLE tbl_name DEFAULT CHARACTER SET charset;
tbl_name is the name of the table to convert, and charset is the name of the server's default character set.
This latter individual-table upgrade method does have one caveat if you run the server with a multi-byte default character set: The server will appear to believe that the lengths of character columns have decreased after you upgrade to 4.1. This occurs due to the change from byte-oriented semantics in 4.0 to character-oriented semantics in 4.1. Suppose that you have a table created using a MySQL 4.0 server with the following definition:
CREATE TABLE t (c CHAR(100));
If you use a multi-byte character set, the server will show a different table definition after you upgrade to MySQL 4.1 or up. For example, if the server uses the sjis double-byte character set, a SHOW CREATE TABLE t statement will display the following table definition after the MySQL upgrade:
CREATE TABLE t (c CHAR(50)) ENGINE=MyISAM DEFAULT CHARSET=sjis;
This happens because when the 4.1 server sees a 4.0 table, it knows that it must be conservative and assume that the column can hold only as many instances of the widest character from the character set that will fit. sjis characters take two bytes, so the server knows it can fit at most 50 two-byte characters into 100 bytes. Your tables still contain the same data, but if you want to make sure that you can insert 100-character strings into the column in the future, convert it with ALTER TABLE:
ALTER TABLE t MODIFY c CHAR(100);
Upgrading Tables When Character Set Conversion Is Necessary
If your tables contain string columns in which you've been storing data that is not really in the server's default character set, you should convert the tables to assign the proper character set explicitly to those columns. Then the server will handle their contents properly with no special handling. However, you must be careful when assigning character sets to avoid having the server improperly convert column contents. The following conversion procedure should be performed for each table individually after upgrading MySQL to 4.1 or later.
CREATE TABLE t ( c1 CHAR(75), c2 CHAR(125) );
To convert columns in the table to have explicit character set information, use these statements:
ALTER TABLE t MODIFY c1 CHAR(75) CHARACTER SET binary, MODIFY c2 CHAR(125) CHARACTER SET binary; ALTER TABLE t MODIFY c1 CHAR(75) CHARACTER SET utf8, MODIFY c2 CHAR(125) CHARACTER SET ujis;,
The reason that two steps are necessary is that if you try to convert the columns directly to utf8 and ujis, the server will think it should convert them from latin1 to those character sets. That won't work because the values are not really latin1 and the conversion would mangle them. Instead, use one step that converts the columns to the binary character set, and a second step to convert from binary to the desired character sets. This works properly because conversions to and from the binary character set result in no attempt by the server to interpret the column contents.
There is a special case for which only the second step is necessary. If you were storing strings in a column that already had a binary string data type, you can convert the column directly to the desired character set. For example, if you were storing utf8 strings in a BLOB column, you can convert it directly to TEXT CHARACTER SET utf8. If you were using a 4.0 CHAR(M) BINARY column, 4.1 considers it to have the BINARY(M) data type. You can convert it directly to CHAR(M) CHARACTER SET utf8.