Font Size: a A A

Researching Of Mongolian Word Segmentation System Based On Dictionary, Rules And Language Model

Posted on:2012-07-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y MingFull Text:PDF
GTID:2178330335472224Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Mongolian is one of the important minority languages in our country. The development o Mongolian information processing technology, mean a lot to the development of politics, culture and society in Mongolian communities.Researching of our Mongolian word segmentation system is the basis of many follow-ups o: the Mongolian information processing work. This article was the first try in the traditiona Mongolian word segmentation, also the first time tried to combine three different methods, say dictionary based, rule based and statistical language model based methods.This study made a lot of effort to organizing and proofreading Mongolian corpus. These data will be of great help on the future work. Our researching of Mongolian word segmentation system had been get ideal word segmentation efficiency. Our test platform of Mongolian word segmentation, had given the unified test platform to the traditional Mongolian word segmentation.This article studies the characteristics of the Mongolian language, learning the Mongolian language syntax, and put forward the improved constructing layered Mongolian language model. It not only considered the context of relationships, but also emphasizes the high coupling of components in a single word.The Mongolian word segmentation system, first preprocess the part of the Mongolian words. And then, do the Segmentation on big part of Mongolian words with dictionary based method. Finally, process the remaining Mongolian words, first step is to generate multiple candidates using with the various rules of Mongolian, second step is to use the improved constructing layered Mongolian language model to select the correct segmentation from the candidates. We combine the three different methods, play their respective advantages and get the traditional Mongolian word segmentation system with excellent performance. Keywords:Mongolian, dictionary, rules, Statistical Language Model...
Keywords/Search Tags:Mongolian, dictionary, rules, Statistical Language Model
PDF Full Text Request
Related items