Font Size: a A A

Research On Kirghiz Basic Part-of-Speech Tagging Based On HMM

Posted on:2014-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:L ChenFull Text:PDF
GTID:2248330398967936Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Part-of-speech tagging as the foundation of the study of natural language, the correct ratewill directly affect the research and applications of syntactic analysis, information extractionand machine translation. For the Kirgiz language natural language processing, the primaryproblem is the part-of-speech tagging. English, Chinese, Uighur, Kazakh language haveachieved significant results in this area; however, there are relatively few studies in theexisting results on the Kirgiz language part-of-speech tagging. Therefore, the establishment oflarge-scale standard Kirgiz corpus, the completion of the part-of-speech tagging of shallowparsing, is of great significance for the subsequent development of the Kirgiz language.According to the current research status of Kirgiz language, this paper designs andimplements the Kirgiz language part-of-speech tagging system based on HMM. The system isdivided into stem extracting module and part-of-speech tagging module. Firstly, we initiallycomplete the Kirgiz language part-of-speech tagging,using the methods of dictionarymatching and rules disambiguation, label the tagged corpus as training corpus; Secondly,through increasing the part of speech of the former word to improve binary HMM model, wecan get the part of speech transition probability matrix and vocabulary probability distributionmatrix; For the corpus data sparseness problem, we use Katzs smoothing techniques to amendthe model parameters, the Viterbi algorithm has been used to complete the Kirgiz languagepart-of-speech tagging.In order to verify the feasibility of the content of the above studies, the standard hiddenMarkov model and improved hidden Markov model have been tested for open and closed testin the corpus of the same size and different size corpus, the unknown word tagging accuracy,category words disambiguation accuracy and part-of-speech tagging accuracy are used to betest evaluation criteria. Due to an increase in improved HMM model for context information,so the correct rate is higher than the standard HMM model, this point has been confirmed inexperimental results.
Keywords/Search Tags:Hidden Markov Model, Kirghiz Language, Basic Part-of-speech Tagging, Nature Language Processing, Smoothing Algorithm
PDF Full Text Request
Related items