Symbol Sets

Chinese Characters

In the course of human events, tens of thousands of unique Chinese characters (hanzi in Chinese, kanji in Japanese) have been created and used. A character is defined by a specific combination of components, that is, groups of stroke(D- - 7). Components in a character are analogous to letters in a syllable. Maddeningly, certain components have different forms and stylistic variants, so any specific character can have an alternate form which, to the untrained eye, looks like a different character!

The are roughly 800 or so different components. At least one of 214 common components is found in every character. These common components are called radicals(D- - 6).

Traditional and Simplified

As a further complication, Chinese use two forms of characters: traditional and simplified(4- 10). Simplified forms are used in (mainland) China, while Taiwan and others use the traditional forms. Most Japanese kanji are also traditional Chinese characters, but a few are simplified, and a few are Japanese inventions (not from Chinese). A few components have typically Chinese and Japanese stylistic variants, but these variants are not significant enough to require separate characters to distinguish them.

Asian Symbol Sets

Because it is not practical to include all of the characters that have ever been used into one symbol set(D- - 7), various organizations have classified Chinese characters according to different schemes, resulting in the different standard symbol sets in use today. The world standards include China's GB2312, Taiwan's CNS11643 (Big 5 Code), Japan's JIS X0208 (JIS), Korea's KS C5601, and a new standard, Unicode. Asian word processors typically work with only one symbol set, with characters divided up into multiple levels(D- - 5). For example, Japanese JIS Level I contains the joyo kanji plus some other commonly used and proper name kanji ordered by pronunciation. JIS Level II contains rare kanji arranged by radical(D- - 6) and stroke(D- - 7).

While Smart Characters is used primarily with the Chinese and Japanese Combined(4- 9) symbol set, a Smart Characters document can use up to different five fonts or symbol sets. You can convert between fonts and symbol sets by using ScConv(D- - 7).

Combined Symbol Set

Smart Characters is supplied with a 16 and 24 point Chinese character font which uses a Combined Japanese and Chinese symbol set(D- - 7). The combined symbol set contains traditional characters for Chinese and the Japanese joyo kanji. The traditional Japanese kanji are identical to their Chinese forms, with some stylistic differences. The unique or simplified joyo kanji are included in the combined symbol set separately at the end. The combined symbol set uses two levels(D- - 5): Level 0 (16h0ch00.fnt) contains 7731 Traditional plus up to 69 user-defined Characters. Level I (16h1ch00.fnt) adds about 400 additional characters to form a 100% concordance(D- - 2) to the Japanese JIS Level I and the Big Five(D- - 1) traditional Chinese code space(D- - 2).

Although the Combined symbol set unifies Japanese and Chinese characters, the degree of unification is less than found in the Unicode symbol set, which requires two separate fonts to display Japanese and Chinese characters. The Combined symbol set maintains separate characters if a unified character would be unacceptable to either a native Japanese or a native Chinese reader.

Symbol Set Unification

Ongoing work on international symbol set(D- - 7) unification by various organizations has yielded changes to accepted forms. The Combined(4- 9) symbol set reflects these changes by accepting unifications that are generally acceptable to native speakers, and unifying Japanese and Chinese characters when appropriate. Consequently, the combined symbol set contains now-unused code spaces that display as character numbers(D- - 2), not characters. You can see these numbers as you browse the combined font, or in documents created by prior versions of Smart Characters. See the Unify(D- - 8) symbol set unification utility.

Simplified Characters

The Combined(4- 9) symbol set combines both traditional Chinese and Japanese characters into one symbol set(D- - 7) for free intermixing within a document. Simplified characters used in mainland China are supported by the optional (not included) accessory simplified character fonts.

User Characters

User characters are characters that you need to use that are not in a standard symbol set(D- - 7). Standard symbol sets define only a few thousand of the most common popular Chinese characters, and are therefore small subsets of the over 50,000 characters that have been used over the years. Consequently, the need frequently arises to use rare or obsolete characters that do not exist in a particular standard symbol set. To use these characters, add them to your user font(4- 12), and add a corresponding reference(8- 2) to your user dictionary(4- 7) for lookup and use. See Adding New Characters(8- 2) and Why Is this Necessary (Chinese)(2C- 45), ( Japanese)(2J- 44).

Sharing User Characters

Most Asian language word processors systems do not contemplate electronic document transmission or work group document sharing, so user characters(4- 10) are generally not practical to transmit. Smart Characters eliminates this restriction by extracting the user characters actually used into a small proxy font(D- - 6), and embedding the proxy font into the document for later extraction when the document is opened. The document displays the correct user characters, without transmitting and installing the author's user font(4- 12).

Symbol Set Index

A document can use up to five symbol sets, although most documents use just one symbol set(D- - 7). Documents which use simplified characters(4- 10), user created characters, or lists such as the concordance files(12- 6) used by the document conversion utility ScConv(D- - 7) use more than one symbol set.

Smart Characters documents contain a symbol set index that holds up to five symbol set names, unique IDs, and encoding methods(D- - 3). Chinese characters are interpreted according to which symbol set and encoding method they are associated with. Chinese characters are associated with a symbol set by applying a symbol set index format code(D- - 3) using the Format Character(3- 16) dialog Asian Character Symbol Set control. This format code includes only the index number from 0 to 4, not the actual symbol set name and unique id.

By convention, some of the index numbers have pre-defined meanings: Document symbol set 0 is used for the default symbol set. Document symbol set 3 is used for user characters(4- 10). Document symbol set 4 is used for a proxy font(D- - 6).

  1. Before a symbol set can be used in a document, it must be registered to a symbol set index. The user characters symbol set is automatically registered the first time a user character is pasted from the List(4- 4) window. Other symbol sets must be registered as required.
  2. The Pick Symbol Set dialog registers and manages symbol sets in a document, enabling the document to display characters from up to four different symbol sets. Typical uses include displaying simplified(4- 10) and traditional Chinese characters, user characters, and symbol sets corresponding to fonts from native word processors (e.g., JIS, Big5, and GB). A symbol set is identified by its name and unique symbol set ID(8- 4).
  3. The Character Format | Register Symbol Set button invokes the Register Symbol Set(3- 18) dialog to specify symbol set indexes, providing greater control (e.g., troubleshooting and automatic installation of fonts) than the simpler and quicker Format Pick Symbol Set(3- 24) dialog.

