Apr 28, 2009

How many chinese characters can a computer display?

Recently there is news Name Not on Our List? Change It, China Says? Some people think of it from a perspective of culture, philosophy, or politics. But I a software engineer, let's talk about it from perspective of computer science.

When we say put your name in computer, what does it means? Let's say your name is "a", it will be converted a number 97 by the computer, this number is represented in a serial of bits (0 or 1). This process is called encoding. But the text you see, is not a number, why? Because computer convert the number into a picture, we call it decoding? The most poplular encoding format today, it is unicode. It uses 16bit to store all the charactors(no matter it is english letter, or chinese charactor, greek, let's just call them charactors at this moment). 16 bit means 65536 possibilities of combination, which means the system can only accommodate 65536 character. As I know the unicode system is still evolving, so it actually can accommodate more than 65536. But the fact is it can only handles limited number of character. In unicode system, the English letter 'a' is converted to number 97. And the Chinese character 'δΈ€' (one) is converted to 19968. Lets say, I have a new baby, I want to give it a Chinese name, I look up a ancient Chinese dictionary, the dictionary was created long before modern computer was invented. And I found a character in the dictionary, which is not our encoding system for example unicode. So there is no way to input the name into existing computer system. What option do we have? We can either update encoding system to encode your name. We need to create a number, and create a picture, and associate the number with the picture. And we also need to distribute the encoding system to all the computers? Obviously, it is not an easy job. The easy solution is to pick your name from the characters in the existing encoding system, for example unicode, but not from an ancient Chinese dictonary or even invent a Chinese character which never exists. That is what the Chinese government is try to do. Actually, unicode already can display more than 20944 Chinese character, even the most knowledgeable Sinologist can use. There are surely some people who will be affected, but 1.3 billion chinese, it is not that significant. Click here to see a list chinese character in unicode ranged from 19968 to 40911.

