Research On Kirghiz Basic Part-of-Speech Tagging Based On HMM

Posted on:2014-01-10

Degree:Master

Type:Thesis

Country:China

Candidate:L Chen

Full Text:PDF

GTID:2248330398967936

Subject:Computer application technology

Abstract/Summary:

Part-of-speech tagging as the foundation of the study of natural language, the correct ratewill directly affect the research and applications of syntactic analysis, information extractionand machine translation. For the Kirgiz language natural language processing, the primaryproblem is the part-of-speech tagging. English, Chinese, Uighur, Kazakh language haveachieved significant results in this area; however, there are relatively few studies in theexisting results on the Kirgiz language part-of-speech tagging. Therefore, the establishment oflarge-scale standard Kirgiz corpus, the completion of the part-of-speech tagging of shallowparsing, is of great significance for the subsequent development of the Kirgiz language.According to the current research status of Kirgiz language, this paper designs andimplements the Kirgiz language part-of-speech tagging system based on HMM. The system isdivided into stem extracting module and part-of-speech tagging module. Firstly, we initiallycomplete the Kirgiz language part-of-speech tagging,using the methods of dictionarymatching and rules disambiguation, label the tagged corpus as training corpus; Secondly,through increasing the part of speech of the former word to improve binary HMM model, wecan get the part of speech transition probability matrix and vocabulary probability distributionmatrix; For the corpus data sparseness problem, we use Katzs smoothing techniques to amendthe model parameters, the Viterbi algorithm has been used to complete the Kirgiz languagepart-of-speech tagging.In order to verify the feasibility of the content of the above studies, the standard hiddenMarkov model and improved hidden Markov model have been tested for open and closed testin the corpus of the same size and different size corpus, the unknown word tagging accuracy,category words disambiguation accuracy and part-of-speech tagging accuracy are used to betest evaluation criteria. Due to an increase in improved HMM model for context information,so the correct rate is higher than the standard HMM model, this point has been confirmed inexperimental results.

Keywords/Search Tags:

Hidden Markov Model, Kirghiz Language, Basic Part-of-speech Tagging, Nature Language Processing, Smoothing Algorithm

Related items

1	Study Of Kazak Part-of-Speech Tagging Based Upon HMM
2	Application Of Hidden Markov Model In Part-of-Speech Tagging
3	Software Requirements Verification Based On Natural Language Processing
4	The Research Of Part-of-speech Tagging Based On Hidden Markov Model
5	Chinese Part-of-Speech Tagging Based On Ameliorated Hidden Makov Model
6	Statistics-based Chinese Pos Tagging Method
7	HMM-based Chinese Part-of-Speech Tagging And Improvement
8	Study On Disambiguation Algorithm For Chinese Word Segmentation
9	Hidden Markov Model Parameters Estimation For Part-of-Speech Tagging
10	Research On Chinese Part-of-speech Tagging Based On Semi Hidden Markov Model