Chinese POS Tagging Based On Maximum Entropy

Posted on:2008-03-02

Degree:Master

Type:Thesis

Country:China

Candidate:H X Kong

Full Text:PDF

GTID:2178360242967619

Subject:Computer applications

Abstract/Summary:

Part of speech (POS) tagging is the problem of assigning POS or lexical categories to all the words in a text. It is the basic work in Natural Language Processing (NLP), and its tagging precision greatly affects the later step of syntax analysis or chunk analysis. The errors occurred in POS tagging will always propagate through the processing chain, so tagging POS correctly has great significance in NLP. The main goal of this thesis is to implement Chinese POS tagging task based on word segmentation, and provide the basis for later syntactic parsing and other NLP tasks.In this thesis, we first introduce the current research status of POS tagging and its significance, then implement Chinese POS tagging system based on Maximum Entropy (ME) on the basis of deep understanding of ME theory, and at last, statistical rules and POS confinement are used for tagging unlogged words.Different context information is introduced to ME model by using different templates, four ME POS tagging models are built, and the template with the highest tagging precision is selected as the final template. In order to simplify the model, three feature selection methods are used to simplify ME model's candidate features. In order to further improve the POS tagging precision, the method of combining rules, POS confinement and ME is adopted. This thesis presents the algorithm of ME tagging model and its result, moreover, the result of further unlogged words tagging is given.POS tagging is comparatively complex. Since ME can make full use of different context of a word on different levels to solve complex problems, so we used ME for POS tagging, and have achieved good results.The experimental results show that using ME for Chinese POS tagging is effective: the open test rate is 94.96%, and the test rate for unclogged words tagging is 63.32%.The POS tagging approaches introduced in this thesis can be used in actual MT system, which can provide basis for further NLP tasks. Moreover, the research of this thesis can be applied to other NLP tasks, such as information retrieval, text classification and so on.

Keywords/Search Tags:

Part Of Speech (POS), ME, Template, Unlogged Words

Related items

1	Mining The Feature And Emotional Words From Product Reviews Based On The Part Of Speech And Syntactic Relations
2	The Study Of Rule-based Chinese Words Tagging Method
3	Research On Arithmetic Of Speech Recognition Based On Speech Control Vehicle
4	The Research Of Part-of-speech Tagging Based On Hidden Markov Model
5	Research And Implementation For Part-of-speech Taggingapply Inautomaticenglish Essay Scoring
6	Study Of Kazak Part-of-Speech Tagging Based Upon HMM
7	Research And Implementation Of Modify Chinese Part-of-Speech Tagging Based On FST Technology
8	Research On Lao Language Part-of-speech Tagging With Multiple Features
9	Research On Laodian Participle And Part-of-speech Tagging Method
10	Research And Implementation Of Aspect-level Sentiment Classification Network Based On Part-of-speech Awareness