2.1 Character Set
JavaScript programs are written using the
Unicode character set. Unlike the 7-bit
ASCII encoding, which is useful only
for English, and the 8-bit ISO Latin-1 encoding, which is useful only
for English and major Western European languages, the 16-bit Unicode
encoding can represent virtually every written language in common use
on the planet. This is an important feature for internationalization
and is particularly important for programmers who do not speak
English.
American and other English-speaking programmers typically write
programs using a text editor that supports only the ASCII or Latin-1
character encodings, and thus they don't have easy access to
the full Unicode character set. This is not a problem, however,
because both the ASCII and Latin-1 encodings are subsets of Unicode,
so any JavaScript program written using those character sets is
perfectly valid. Programmers who are used to thinking of characters
as 8-bit quantities may be disconcerted to know that JavaScript
represents each character using 2 bytes, but this fact is actually
transparent to the programmer and can simply be ignored.
Although the
ECMAScript v3 standard allows Unicode
characters anywhere in a JavaScript program, Versions 1 and 2 of the
standard allow Unicode characters only in comments and quoted string
literals -- all other parts of an ECMAScript v1 program are
restricted to the ASCII character set. Versions of JavaScript that
predate ECMAScript standardization typically do not support Unicode
at all.
|