Font Size: a A A

Research And Implementation Of Convertion For Mongolian Presentation Character To Basic Letter

Posted on:2011-06-27Degree:MasterType:Thesis
Country:ChinaCandidate:D B L AoFull Text:PDF
GTID:2178360305991258Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the further development of the computer Application technology, Mongolian information processing has also made great progress. From the late 70s of last century, our country began to research on Mongolian information processing, and put focus on aspect of Mongolian words processing, word code were usually designed by font. However, because of Mongolian specific phenomenon "similar in shape but distinct in pronunciation" and "similar in pronunciation but distinct in shape", the coded scheme designed by font cannot satisfy the further researching demands for Mongolian information processing.In 2000, Mongolian Unicode character set were defined by ISO/IEC10646 Unicode System, it has 35 Mongolian basic letters, also known as Mongolian nominal characters. Nominal character coded system not only considered pronunciation of Mongolian letter, but also its font. This coded system accord with the feature of Mongolian spelling words, and is beneficial to international standardization for Mongolian information processing.In Actual application, Mongolian is shown in presentation characters of nominal character. Because of Mongolian presentation character codes are non-unified or people input words according as fonts in early period of study, there are a lot of spelling errors in existing Mongolian Electronic documents. Font of error words is same with correct words, but its internal code in computer is different. Therefore, available documents through different Mongolian word processing system cannot achieve direct exchange and sharing of information. It is an important basic and technical issue for Mongolian information processing to convert Mongolian words expressed in presentation characters into Mongolian words expressed in correct nominal characters and achieve the store standardization of Mongolian information.In this paper, using combination methods of rule,dictionary and statistics, Mongolian words expressed in presentation characters convert into expressed in nominal characters is completed. Because of difference in distinct presentation character coded system of Mongolian information processing system, this paper use Min-grapheme code as middle code to convert universally. In this paper, two parts of work are completed:First, the relations of conversion between several Mongolian presentation character codes and Min-grapheme code are investigated, according to the relations table of conversion, various Mongolian presentation codes is collectively converted into Min-grapheme code; Second, several methods are used to achieve codes converting from the min-morpheme code to the Mongolian nominal character code. They are the method based on the Mongolian orthography dictionary, the method based on separating stem and affix, and the method based on statistical language model, and comprehensive use of the above measures to improve the conversion correct rate, and achieved the desired results.
Keywords/Search Tags:Mongolian, Presentation character, Basic letter, Statistical language model
PDF Full Text Request
Related items