Font Size: a A A

Study Of Chinese POS Tagging Based On Maximum Entropy

Posted on:2009-12-28Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:2178360272470654Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
POS(Part-of-speech) tagging is a process to tag every word in text with a proper POS. As a basic task in natural language processing, POS tagging is a necessary preparation for next syntax analysis or chunk analysis. The errors in POS tagging may be enlarged in next processing chain and reduce the precision ratio so that a high precision ratio in POS tagging is very important for natural language processing. The purpose of this paper is to research and find new methods to improve the way in Chinese POS tagging, so as to serve the next syntax analysis or other processing tasks.Maximum Entropy Model is a statistic model easy to use and has a good precision ratio. The common way to use Maximum Entropy model is getting result directly. But after the experiments, it can be found that there is about 94% of all words of which the right tag is the most proper tag, about 3% of all words of which the right tag is the second proper tag. Neglecting the second proper tags equals to neglect some useful information. For this reason, the tagging precision ratio can be raised by some improvement which can make use of the information.This paper propose three tagging methods: using Hidden Markov Model after Maximum Entropy Model, fusing several results of different Maximum Entropy Models, using CRFs after Maximum Entropy Model. The console of the three methods is to put the most proper and second proper tags in consideration at the same time.The results of the experiments show that the methods are effective: the three methods can raise the precision ratio by 0.45%,0.32% and 1.53% separately compare with the result of a single Maximum Entropy Model. Among the three methods, the result of using CRFs after Maximum Entropy Model is the best.POS tagging is a basic task of natural language processing. The research results in this paper can not only serve other tasks in natural language processing such as chunk parsing and named entity recognition, but also be contained in practical systems as a specific way of POS tagging.
Keywords/Search Tags:Part-of-speech tagging, Maximum Entropy, Natural Language Processing
PDF Full Text Request
Related items