Study Of Kazak Part-of-Speech Tagging Based Upon HMM

Posted on:2012-07-13

Degree:Master

Type:Thesis

Country:China

Candidate:C F Hou

Full Text:PDF

GTID:2178330335985931

Subject:Computer application technology

Abstract/Summary:

Part-of-Speech tagging is an important intermediate links to realize and understand natural language analysis, plays an important role in the field of natural language information processing, is the basis of syntactic analysis, information extraction and machine translation. The same is true of the Kazak.The research of Kazakh Part-of-Speech tagging can be directly applied to the Kazakh language speech recognition, machine translation systems and many other practical applications of Kazakh information retrieval, has important practical significance.In the thesis, three methods were used for tagging such as Baum-Welch, traditional HMM and improved HMM. Among them, Baum-Welch method used for handing raw corpus tagging. Use the traditional HMM for tagging, the treatment effect is not very good for category words. Therefore, improved the HMM model parameters based upon the traditional HMM. Improved HMM to make up the traditional lack of dependence only up to join the word backward dependence, optimizing the calculation of emission probability of vocabulary, better reflects the context-dependent words.In addition, used stable performance of smoothing technique for data-smoothing, proposed a treatment of words which are not logged enable in text. Use statistical methods to train Kazakh corpus, and then Viterbi algorithm to implement Part-of-Speech tagging. Systems use 10-million, 20 million, 25 million words corpus for training separately. The experimental results show that the effect of Part-of-Speech tagging of improved HMM is better than traditional HMM.

Keywords/Search Tags:

HMM, Kazakh, Part-of-Speech Tagging, Nature Language Processing, Smoothing Algorithm

Related items

1	Research On Kirghiz Basic Part-of-Speech Tagging Based On HMM
2	The Development Of Part-of-speech Tagging Software For Kazakh Language
3	Research On Methods For Kazakh Lexical Analyzing And Phrase Parsing Based On Rules And Statistics
4	Chinese Word Found Its Part Of Speech Tagging
5	Chinese Part-of-Speech Tagging Based On Ameliorated Hidden Makov Model
6	Research On Lao Language Part-of-speech Tagging With Multiple Features
7	A Research On Lao Language Part-of-speech Tagging With Multi-feature Fusion
8	Research On Parallel Corpora-based Unsupervised Part-of-speech Tagging For Chinese
9	Research On Part-of-Speech Tagging Algorithms Of Mathematical Corpus Based On Deep Learning
10	Research On Text Document Information Hiding