Research On Proofreading Algorithm Of Mongolian Homograph Based On Finite State Automata

Posted on:2015-02-26

Degree:Master

Type:Thesis

Country:China

Candidate:B Lian

Full Text:PDF

GTID:2268330428982759

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the development of Mongolian information processing technology, the research content is more and more deeply, how to ensure the correctness of the text becomes more and more important. Therefore, workload of proofreading increases greatly, manual correction mode has been unable to adapt to the rapid growth of the number of electronic texts, automatic proofreading of mongolian text has become an urgent problem to be solved.Mongolian is an alphabetic writing, unlike other alphabetic writing, Mongolian use presentation character to express text in writing. Writers often wrongly input presentation character with the same shape but different pronunciation to word, which we call the same shape but different pronunciation phenomenon. This word is correct from the grapheme, but its internal code is wrong. In most cases, computer recognizes vocabulary according to character encoding. If we donâ€™t correct these errors, it will make the research of Mongolian information processing more difficult.The traditional effective method of checking non-word error is to find the dictionary. If the word is not in the dictionary, then it is considered as non-word. However, mongolian is an adhesive language, also is an alphabetic writing, most words formation are achieved by connecting different suffix after root or stem. Dictionary contains a limited number of words, we should use the method combined dictionary with word-formation rules in order to expand the coverage of vocabulary, which is the commonly used method when processing adhesive language.This paper integrates dictionary and word-formation rules into finite state automata model, and introduces a proofreading method of Mongolian homographs based on finite state automata. First of all, based on "Mongolian orthographic dictionary", we construct the lexical analyzer according to the grammatical features of Mongolian and word structure. Secondly, we build a rule base of homoemorphy character according to the traditional Mongolian international standard code basic character set, the traditional Mongolian presentation character of nominal character and deformation rules. Finally, we carry on heuristic search on the lexical analyzer according to the homoemorphy character rule base. If the input word is correct, we donâ€™t do any processing; if the input word is wrong, we use homograph for their correction.

Keywords/Search Tags:

Mongolian, Homograph, Finite State Automata, Spelling Correction

PDF Full Text Request

Related items

1	Research On Cyrillic And Mongolian Scriptâ€™s Morphology And Conversion System
2	Global Inte Rp Retation Of Gbdt Model Based On P Robabilistic Finite-state Automata
3	Applications Of Spelling Correction Techniques In Information Retrieval And Text Processing
4	Chinese Spelling Error Correction Algorithm Incorporating Multimodal Semantic Features And Applications
5	Smallest State After The Computation Of Finite Automata
6	Societies of randomly interacting finite-state automata
7	Algebraic Properties Of Probabilistic Finite Automata
8	Research On The Homonyms Of Mongolian Network Text Disambiguation Algorithm
9	Hierarchical Phrase-Based Mongolian-Chinese Statistical Machine Translation
10	The formalism of affordance in human-machine cooperative systems using finite state automata (FSA)