Font Size: a A A

Chinese Pos Tagging Study

Posted on:2008-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:W ZhangFull Text:PDF
GTID:2208360215453939Subject:Education Technology
Abstract/Summary:PDF Full Text Request
With the development of information retrieval and Natural Language Processing(NLP), the introduction of Natural Language Processing technology in the field of information retrieval is one of the most important trends in the development of information retrieval. As the base of Natural Language Processing, part-of-speech (POS) tagging can improve the effectiveness and efficiency of information retrieval, and therefore has a very important role in the field of information retrieval.This paper first research Chinese Word Segmentation, the basis of Chinese part-of-speech tagging technology. The Chinese segmentation module adopts maximum matching as a basic segmentation method, rules and statistical methods to solve problems of segmentation ambiguities, corresponding strategies are used to identify Chinese names and high frequent unknown words. Design and realization of the Chinese word segmentation system with higher accuracy and faster breakdown of the sub-term rate provides a good basis for the POS tagging.In this paper, the study of POS tagging is based on statistical techniques, and the POS tagging system is mainly achieved through Hidden Markov Model for the mature corpus statistical data, obtains POS and terms information that are requisite, and Viterbi algorithm is used for tagging. In view of the statistical data sparse, a simple and efficient algorithm for data smoothing, Katz algorithm is used to avoid the lack of statistical data resulting decline in the accuracy of POS tagging. Meanwhile, the appropriate POS is chosen for unknown word. Experimental results show that the technology can be used to achieve higher accuracy and disambiguation rate in the POS tagging.Finally, the design and implementation of POS tagging system, such as module structure, logic module design are illuminated. At last, the overall performance of the system is tested.
Keywords/Search Tags:POS tagging, Chinese word segmentation, HMM, data smoothing, unknown words
PDF Full Text Request
Related items