Chapter 4. i18n, L10n, and Unicode
Internationalization, localization, and Unicode are all hot topics in the field of modern web application development. If you build and launch an application without support for multiple languages, you're going to be missing out on a huge portion of your possible user base. Current research suggests that there are about 510 million English-speaking people in the world. If your application only caters to English speakers, you've immediately blocked 92 percent of your potential global audience. These numbers are actually wildly inaccurate and generally used as a scare tactic; you have to consider how many of the world's six billion or so population is online to begin with. But even once we factor this in, we are still left with 64 percent of online users (around 680 million people) who don't speak English (these statistics come from the global reach web site: http://global-reach.biz/). That's still a huge number of potential users you're blocking from using your application.
Addressing this problem has historically been a huge deal. Developers would need advanced knowledge of character sets and text processing, language-dependent data would need to be stored separately, and data from one group of users could not be shared with another. But in a world where the Internet is becoming more globally ubiquitous, these problems needed solving. The solutions that were finally reached cut out a lot of the hard work for developersit's now almost trivially easy to create a multilanguage application, with only a few simple bits of knowledge.
This chapter will get you quickly up to speed with the issues involved with internationalization and localization, and suggest simple ways to solve them. We'll then look at Unicode in detail, explaining what it is, how it works, and how you can implement full Unicode applications quickly and easily. We'll touch on the key areas of data manipulation in web applications where Unicode has a role to play, and identify the potential pitfalls associated with them.