Font Size: a A A

Research On Proofreading Algorithm Of Mongolian Homograph Based On Finite State Automata

Posted on:2015-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:B LianFull Text:PDF
GTID:2268330428982759Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of Mongolian information processing technology, the research content is more and more deeply, how to ensure the correctness of the text becomes more and more important. Therefore, workload of proofreading increases greatly, manual correction mode has been unable to adapt to the rapid growth of the number of electronic texts, automatic proofreading of mongolian text has become an urgent problem to be solved.Mongolian is an alphabetic writing, unlike other alphabetic writing, Mongolian use presentation character to express text in writing. Writers often wrongly input presentation character with the same shape but different pronunciation to word, which we call the same shape but different pronunciation phenomenon. This word is correct from the grapheme, but its internal code is wrong. In most cases, computer recognizes vocabulary according to character encoding. If we don’t correct these errors, it will make the research of Mongolian information processing more difficult.The traditional effective method of checking non-word error is to find the dictionary. If the word is not in the dictionary, then it is considered as non-word. However, mongolian is an adhesive language, also is an alphabetic writing, most words formation are achieved by connecting different suffix after root or stem. Dictionary contains a limited number of words, we should use the method combined dictionary with word-formation rules in order to expand the coverage of vocabulary, which is the commonly used method when processing adhesive language.This paper integrates dictionary and word-formation rules into finite state automata model, and introduces a proofreading method of Mongolian homographs based on finite state automata. First of all, based on "Mongolian orthographic dictionary", we construct the lexical analyzer according to the grammatical features of Mongolian and word structure. Secondly, we build a rule base of homoemorphy character according to the traditional Mongolian international standard code basic character set, the traditional Mongolian presentation character of nominal character and deformation rules. Finally, we carry on heuristic search on the lexical analyzer according to the homoemorphy character rule base. If the input word is correct, we don’t do any processing; if the input word is wrong, we use homograph for their correction.
Keywords/Search Tags:Mongolian, Homograph, Finite State Automata, Spelling Correction
PDF Full Text Request
Related items