Study Of Chinese POS Tagging Based On Maximum Entropy

Posted on:2009-12-28

Degree:Master

Type:Thesis

Country:China

Candidate:L Zhang

Full Text:PDF

GTID:2178360272470654

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

POS(Part-of-speech) tagging is a process to tag every word in text with a proper POS. As a basic task in natural language processing, POS tagging is a necessary preparation for next syntax analysis or chunk analysis. The errors in POS tagging may be enlarged in next processing chain and reduce the precision ratio so that a high precision ratio in POS tagging is very important for natural language processing. The purpose of this paper is to research and find new methods to improve the way in Chinese POS tagging, so as to serve the next syntax analysis or other processing tasks.Maximum Entropy Model is a statistic model easy to use and has a good precision ratio. The common way to use Maximum Entropy model is getting result directly. But after the experiments, it can be found that there is about 94% of all words of which the right tag is the most proper tag, about 3% of all words of which the right tag is the second proper tag. Neglecting the second proper tags equals to neglect some useful information. For this reason, the tagging precision ratio can be raised by some improvement which can make use of the information.This paper propose three tagging methods: using Hidden Markov Model after Maximum Entropy Model, fusing several results of different Maximum Entropy Models, using CRFs after Maximum Entropy Model. The console of the three methods is to put the most proper and second proper tags in consideration at the same time.The results of the experiments show that the methods are effective: the three methods can raise the precision ratio by 0.45%,0.32% and 1.53% separately compare with the result of a single Maximum Entropy Model. Among the three methods, the result of using CRFs after Maximum Entropy Model is the best.POS tagging is a basic task of natural language processing. The research results in this paper can not only serve other tasks in natural language processing such as chunk parsing and named entity recognition, but also be contained in practical systems as a specific way of POS tagging.

Keywords/Search Tags:

Part-of-speech tagging, Maximum Entropy, Natural Language Processing

PDF Full Text Request

Related items

1	Research On Text Document Information Hiding
2	Chinese Word Found Its Part Of Speech Tagging
3	Research On Parallel Corpora-based Unsupervised Part-of-speech Tagging For Chinese
4	Study Of Kazak Part-of-Speech Tagging Based Upon HMM
5	Research On Part-of-Speech Tagging Algorithms Of Mathematical Corpus Based On Deep Learning
6	The Study And Analysis Of Oracle Bone Inscriptions Based On Statistical Natural Language Processing
7	Research On Kirghiz Basic Part-of-Speech Tagging Based On HMM
8	The Research Of Applying Conditional Random Fields To Chinese Word Segmentation And Part-Of-Speech Tagging
9	Research On Laodian Participle And Part-of-speech Tagging Method
10	Research On Lao Language Part-of-speech Tagging With Multiple Features