Addison Wesley, 2002. — 896 p.
Foreword: Unicode began with a simple goal: to unify the many hundreds of separate character encodings into a single, universal standard. These character encodings were incomplete and inconsistent: Two encodings would use the same number for two different characters, or use different numbers for the same character. Any given computer (especially servers) needed to support many different encodings, yet whenever data was passed between different encodings or platforms, that data always ran the risk of corruption.
Unicode was designed to fix that situation: to provide a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.
Unfortunately, Unicode has not remained quite as simple as we originally planned. Most of the complications derive from three unavoidable sources. First, written languages are complex: They have evolved in strange and curious ways over the centuries. Many oddities end up being reflected in the way characters are used to represent those languages, or reflected in the way strings of characters function in a broader context. Second, Unicode did not develop in a vacuum: We had to add a number of features — and a great many characters — for compatibility with legacy character encoding standards. Third, we needed to adapt Unicode to different OS environments: allowing the use of 8-, 16-, or 32-bit units to represent characters, and either big- or little-endian integers. Of course, in addition to these three factors, there are in hindsight parts of the design that could have been simpler (true of any project of such magnitude).