HMM-based Chinese Part-of-Speech Tagging And Improvement

Posted on:2012-07-29

Degree:Master

Type:Thesis

Country:China

Candidate:Y Zhu

Full Text:PDF

GTID:2178330332490700

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Part-of-Speech (POS) tagging is one of the research points on Natural Language Processing which has important significance. It involves a wide range of applications, and it plays an important foundation role in the context of Information Processing. The quality of POS tagging has a direct impact on the accuracy of all Information Processing which based on the results of POS tagging, such as Syntax Analysis, Speech Recognition, Text Classification, Text to Speech, Information Retrieval, machine translation and so on. There are some difficult in the implementation process of POS tagging. Such as, the ambiguity processing of concurrent words, the processing of unknown words and proper noun. Because of the characteristics of Chinese language itself and the restrictions on Chinese Linguistics Research, Chinese POS tagging has more difficulties and complexities.There are many ways of POS tagging, and them can be grouped into two categories as Rule-based methods and Statistical methods in general. HMM-based POS tagging is a typical example of statistical methods. Although the applications of HMM in POS tagging are very mature, but how to improve the tagging accuracies of Concurrent words and Unknown words is still focal points of the study on HMM-based POS tagging. This text bases on the tagged Chinese Corpus namedã€ŠPeople's Daily (Jan.1998)ã€‹, establishing the second-order Hidden Markov Modes(HMM2), improving the tagging of Unknown words, by training, testing, and evaluating the model to achieve the Chinese POS tagging. As follows:(1) Because the selection of the corpus plays an important influence in the results of POS tagging, preprocessing the corpus before training and testing. The preprocessing is removing the second dimension and the sign of Proper noun tagging (continue to have the Proper nouns and their tags) to improve the accuracies of the experiments.(2) When the general HMM is carrying out the POS tagging, it is just relied on the tagging of the previous word to estimate the tagging of the current word. Considering based on the linguistic knowledge, this method is not exhaustively extract the semantic information of context. Therefore, put forward the idea that establish the second-order HMM to increase the use of the semantic information of context, thereby increasing the accuracy of the POS tagging results. In the establishment of the second-order HMM, the state transition probability which gets from the training date has been smoothed; as the same time, according to the test in the actual situation, modifying the acquisition method of observation probability, and processing the unknown words in order to further ensure the accuracy of the experiments. (3) In testing, the traditional Viterbi Algorithm can't meet the improved second-order HMM. So, making the Viterbi Algorithm to be improved and expanded in order to meet the needs of the modified second-order HMM.After the open testing of a ten thousand words on the training corpuses with the annotation of 26 tags and the other annotation of 39 tags, proving the improved second-order HMM in this text has a good effect than the general HMM and HMM2. Finally, this text gives a prospect of the development of POS tagging.

Keywords/Search Tags:

speech tagging, hidden Markov model, second-order hidden Markov model, Viterbi algorithm

PDF Full Text Request

Related items

1	Statistics-based Chinese Pos Tagging Method
2	Application Of Hidden Markov Model In Part-of-Speech Tagging
3	The Research Of Part-of-speech Tagging Based On Hidden Markov Model
4	Hidden Markov Model Parameters Estimation For Part-of-Speech Tagging
5	Statistical Based Mongolian Part-of-Speech Tagging Study And Realization
6	Research On Multiresolution Hidden Markov Model For Image Denoising
7	Research On Method Of Unit Selection Speech Synthesis Based On Hidden Markov Model
8	Research Of Speech Recognition Based On Mixture Feature Extraction And Improved Continuous Hidden Markov Model
9	Research On Human Behavior Recognition Based On High-order Hidden Markov Model
10	Detection Of Cell Division Sequence Based-on Hidden Markov Model