Scanning Documents and Importing Native Files

Using Optical Character Recognition

At the present time, most text exists or is available only in printed form stored on paper. Traditional methods of processing this text involve annotating it (or a copy) with a pencil, or typing it into a word processor. However, typing it is exceedingly slow and error prone, hence then need for an automatic method of converting the printed text to electronic text. This process is called optical character recognition(11- 2) (OCR). The process involves scanning a page to produce an electronic image of the page, then processing the image to recognize the electronic text. The text is then checked against the original image, errors are corrected, and the text is saved in a standard native file format. Although full page black and white scanned images are quite large (2-3 megabytes) in size, the resulting text is only about 1/1000 of that.

Although Roman language OCR programs are fairly sophisticated, Asian language OCR is still in its infancy. The technical challenges include distinguishing a 200 times larger character set, and the need to achieve essentially perfect accuracy. Consequently, Asian language OCR currently involves considerable human interaction to produce acceptable results.

Optimal image contrast and brightness are critical to error-free recognition, so scanning parameters must be adjusted to yield an optimal image for recognition. Sometimes a compromise is required. After the recognition process is run, the correction process verifies the recognition results with the original image, and corrects whatever errors in zoning and character recognition are noticed. The verification and correction process can be time consuming. If there are a large number of errors caused by poor image quality, it may be easier to re-scan and reprocess a page than to correct the errors.

To use an Asian language OCR(11- 2) program to import scanned text:

Importing Native Files

You can import scanned files or plain text files from native word processors directly into Smart Characters, or run them through a translation or annotation step before importing them. If you add native fonts to Smart Characters, you can open the files in their native code spaces. See the File Format(3- 2) dialog and Use Other Fonts(8- 5).
