Font Size: a A A

A Study On Coding Conversion Of The Mongolian

Posted on:2009-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:Z GongFull Text:PDF
GTID:2178360245486797Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The processing of information in Mongolian started in the late 1970s. With the application of the computer technology to the information processing in Mongolian, some significant improvements have been made by quite a lot of research units in terms of the Mongolian word processing. If the work of the information processing in Mongolian is to be carried out, such fundamental technological problems have to be solved as the Mongolian coding, the Mongolian inputting, and the Mongolian word bank. As for the Mongolian coding, due to the relative independence of the research work, all the research units have respectively adopted the Mongolian coding systems based on the word shapes, for the state hasn't set any unitary standards timely. Only in 1993 was the international standardized coding defined in the international coding standards ISO/IEC 10646.The research work of the information processing in Mongolian was initially started in the aspect of the word composition. Because what the word composition system concerns more is the 'shape' of the words, it serves its purpose when only a word can appear in the right shape. Therefore, the Mongolian coding scheme was created based on the shape code. Such a phenomenon is quite common in the Mongolian characters that one shape often has multiple pronunciations. Besides, some partial shapes which compose some characters can appear repeatedly in many other characters. Thus, when stipulating their own shape coding, different research units have adopted various schemes. For instance, some adopt the scheme of one coding for one character with letters liable to have different pronunciations, while others favor the scheme of one character for more than one coding with the same character but different coding. Still some others take the scheme of using partial compositions in several letters to redefine a character, or recombine some strokes in letters for the sake of writing habits or esthetic sense and define one coding for each character.With the further development of the information in the Mongolian language, people gradually realize the problems brought about by the differences in the Mongolian coding. Since different Mongolian coding systems are not containing mutually and many information resources of the different coding systems cannot be shared, a lot of human resources, material and financial resources have been wasted for they are often repeated technological exploitations.This paper mainly discusses the conversion between Menkeli Mongolian coding, Oyuta Mongolian coding, Saiyin Mongolian coding and Mongolian international standardized coding. The Mongolian in this paper especially refers to the traditional Mongolian, excluding others like TODO,SIBE,MANCHU and Ali Gali. Menkeli Mongolian coding, Oyuta Mongolian coding, Saiyin Mongolian coding have adopted the shape coding scheme based on Unicode while the Mongolian standardized coding after conversion intends to adopt the Mongolian state-standardized coding which is being reported for official approval. The whole coding conversion can be conducted in three steps. First, analyze coding characteristics, stipulate coding conversion rules, and then achieve the preliminary coding conversion by the computer procedure. Second, establish the Mongolian dictionary bank to check the accuracy in converting words. Third, a parallel language data base is established to replenish the vocabulary of the dictionary bank and to adjust uncertain coding conversion.
Keywords/Search Tags:Mongolian, character coding, coding conversion
PDF Full Text Request
Related items