Statistical Based Mongolian Part-of-Speech Tagging Study And Realization

Posted on:2011-09-14

Degree:Master

Type:Thesis

Country:China

Candidate:H Yan

Full Text:PDF

GTID:2178360305992514

Subject:Computer application technology

Abstract/Summary:

With computer technology, especially the rapid development and popularity of network technology, it is increasingly eager to exchange information between the natural language and computer. Therefore, the natural language information processing has been unprecedented attention and concern by many researchers at home and abroad. The part-of-speech tagging (POS) is the basis for natural language information processing, and the tagging accuracies first-hand impacts the follow-up studies. Currently, a lot of related research on the aspects of Chinese Automatic Speech Tagging has been done by researchers, and some significant results have been achieved. But, the relevant Mongolian automatic part-of-speech tagging studies still lack.The Mongolian automatic part-of-speech tagging is studied, and a Mongolian automatic part-of-speech tagging system based on statistics is implemented in the paper. The training corpus is trained using above system with hidden Markov model. Two important model parameters, that is, the part of speech the word transition probability matrix and probability distribution matrix are received. The model parameters to be used VITERBI automatic part-of-speech tagging algorithm. In the paper, sparse data problem of the Hidden Markov Model is solved using the word segmentation and linear interpolation method. And, the reducibility of POS tagging accuracy as a result of data sparse is avoided to a certain extent.Finally, using the system to the Mongolian automatic POS tagging when Mongolian segmentation before and after is made by the following test. First, corpus of different sizes is made under a closed test and an open test. Then, a closed set test and open test are respectively marked when part-of-speech tagging set are 2 and 3. Test evaluation criteria were used in POS tagging accuracy and part-category words disambiguation accuracy. Under the scale of 950,000 words corpus as training corpus, the 50000 words test set is tested. Experimental results show that the POS tagging accuracies and disambiguation accuracy rate under a closed test are about 97.9% and 85.9% respectively, and relevantly are about 97.6% and 85.5% under an open test.

Keywords/Search Tags:

Mongolian Part-of-Speech Tagging, Statistical Method, Hidden Markov Model, VITERBI Arithmetic

Related items

1	HMM-based Chinese Part-of-Speech Tagging And Improvement
2	The Research Of Part-of-speech Tagging Based On Hidden Markov Model
3	Statistics-based Chinese Pos Tagging Method
4	Research On Laodian Participle And Part-of-speech Tagging Method
5	Application Of Hidden Markov Model In Part-of-Speech Tagging
6	Research On Chinese Part-of-speech Tagging Based On Semi Hidden Markov Model
7	Hidden Markov Model Parameters Estimation For Part-of-Speech Tagging
8	Research On Kirghiz Basic Part-of-Speech Tagging Based On HMM
9	Research On Improvements Of Chinese Part-of-Speech Tagging System Based On Statistical Model
10	Research On Mongolian Lexical Analysis Based On Combination Of Statistical And Rule Approaches