Font Size: a A A

Universal multilingual information interchange system with character reader and terminal

Posted on:1991-09-10Degree:Ph.DType:Dissertation
University:Concordia University (Canada)Candidate:Krishnamoorthy, SubanFull Text:PDF
GTID:1478390017451042Subject:Information Science
Abstract/Summary:
The need for a universal multilingual information interchange system with character reader and terminal has been emphasized. A scheme for recognizing machine-printed and handprinted Indian characters has been developed. In this scheme, the characters are assumed to be composed of symbols which in turn are assumed to be composed of line-like elements, called primitives, satisfying certain structural constraints. Attribute graphs are used to describe the structural composition of symbols in terms of the primitives and the relational constraints satisfying them. In the first stage of the two stage recognition process, the correlation coefficients are computed with the attribute graphs stored in the knowledge base for a set of basic graphic symbols and then maximized to recognize the symbols constituting the input character. In the second stage, the input character is recognized from the graphic symbols using a decision tree. The recognizer is intelligent enough to differentiate invalid combination of graphic symbols that do not constitute a valid character. Some preprocessing techniques are discussed to convert the input image into an attribute graph. A coding scheme has been developed to describe multilingual texts. Tamil and Malayalam characters were used to test the recognizer. The results for Tamil are: 91.33% recognition rate, 6.83% rejection rate and 1.83% substitution rate. The results for Malayalam are: 89.5% recognition rate, 8.3% rejection rate and 2.2% substitution rate.;A design is presented for a keyboard based multilingual terminal system for Indian languages. An interactive, computer aided, pattern recognition based, methodology has been developed to identify the best dot matrix size for a given optimality criteria. The nearly optimal dot matrix size for the Tamil symbols was computed to be 11 x 14 using a graphics terminal having 60 dots per inch resolution. From the symbol size, the character size has been computed as 15 x 18. An iterative method of determining the most distinct set of dot matrix characters is described based on the distances and information content of the dot matrix characters. Symbol based keyboard design and character generation methodologies have been developed.;Finally, the design of a multilingual data communication system using the multilingual character reader and terminal is considered as an integrated information interchange system. Various communication aspects such as character coding, multilingual document representation, and protocol changes needed for multilingual information interchange are described in detail. Also, a methodology for handling multilingual texts using the existing programming language tools such as compilers for software development is described.
Keywords/Search Tags:Multilingual, Information interchange system, Character, Terminal, Dot matrix, Size, Using
Related items