Japanese and Chinese Character Codes

The ASCII code, capable of representing all upper and lowercase Roman letters, numerals, punctuation, control codes, and special characters, is used in the United States and throughout the remainder of the world. ASCII is both a symbol set (a definition of what is and is not an ASCII character) and a code space (the specific encoding of a symbol).

In Japan, the Japan Industrial Code (JIS) defines all Kanji, Hiragana, Katakana, as well as Roman and Greek characters, numbers, and special symbols and characters. JIS uses a 16 bit (2 byte) code in which each Japanese character is represented by two consecutive bytes in a string.

Since ASCII is the global standard, a modification of JIS, called shiftJIS, was developed which allows Japanese characters to be intermixed with ASCII characters. The first byte of a 2 byte shift JIS character does not match any 7 bit ASCII character.

Another flavor of JIS called "euc" is formed from JIS by setting the high bit on both bytes on. Other minor variations of these formats proliferated in the 1980's as Japanese manufacturers sought to keep their hardware and software proprietary and customers captive.

Two Chinese standards have emerged from an initial flurry to become dominant: the "Big Five" code from Taiwan and Guo Biao (GB) code from mainland China. Roughly speaking, Big Five encodes traditional characters while Guo Biao encodes simplified characters.

Unicode, a new standard used in Windows, is symbol set and code space that unifies hanzi, kanji, and hanja (Korean characters) into one symbol set. However, a Chinese Unicode font cannot be used to print Japanese and vice versa because of stylistic differences that would be objectionable to native speakers.

(This is just the tip of the iceberg. For a definitive look at the state the existing national and international standards, check out Ken Lunde's CJK.inf.)

Because Asian word processors typically work with only one symbol set, they are not able to display both Japanese and Chinese in the same document. Smart Characters has a unique symbol set and code space which combines both traditional Chinese and Japanese characters for free intermixing within a document. Additionally, Smart Characters is able to read and write JIS, Big 5, and GB codes, separately, or in the same document.

Due to memory limitations of early machines and the large number of Chinese characters, Asian character sets are divided into levels. The first level consists of the mostused characters in the language, while level II characters are rarely used. Characters that are not in the symbol set have to be created by hand using a font editor: they are called user characters. One difficulty of user characters is that there is no standard for electronic transmission, so user characters on one system will display incorrect characters on another. Smart Characters solves this problem by embedding user characters in a document for later extraction, display, and possible adoption by the recipient.

This background information is provided to help those unfamiliar with the Chinese or Japanese languages understand the terms and concepts discussed in the Apropos Smart Characters product literature.

Need more info? Try the Customer Service Page or E-mail to Apropos Customer Service

Apropos Customer Service home page 617-648-2041
Last Modified: August 28, 1996