Font Size: a A A

Study Of Kazak Part-of-Speech Tagging Based Upon HMM

Posted on:2012-07-13Degree:MasterType:Thesis
Country:ChinaCandidate:C F HouFull Text:PDF
GTID:2178330335985931Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Part-of-Speech tagging is an important intermediate links to realize and understand natural language analysis, plays an important role in the field of natural language information processing, is the basis of syntactic analysis, information extraction and machine translation. The same is true of the Kazak.The research of Kazakh Part-of-Speech tagging can be directly applied to the Kazakh language speech recognition, machine translation systems and many other practical applications of Kazakh information retrieval, has important practical significance.In the thesis, three methods were used for tagging such as Baum-Welch, traditional HMM and improved HMM. Among them, Baum-Welch method used for handing raw corpus tagging. Use the traditional HMM for tagging, the treatment effect is not very good for category words. Therefore, improved the HMM model parameters based upon the traditional HMM. Improved HMM to make up the traditional lack of dependence only up to join the word backward dependence, optimizing the calculation of emission probability of vocabulary, better reflects the context-dependent words.In addition, used stable performance of smoothing technique for data-smoothing, proposed a treatment of words which are not logged enable in text. Use statistical methods to train Kazakh corpus, and then Viterbi algorithm to implement Part-of-Speech tagging. Systems use 10-million, 20 million, 25 million words corpus for training separately. The experimental results show that the effect of Part-of-Speech tagging of improved HMM is better than traditional HMM.
Keywords/Search Tags:HMM, Kazakh, Part-of-Speech Tagging, Nature Language Processing, Smoothing Algorithm
PDF Full Text Request
Related items