Chinese POS Tagging Employing Maxent And Word Clustering

Posted on:2011-10-08

Degree:Master

Type:Thesis

Country:China

Candidate:Z Z Li

Full Text:PDF

GTID:2178360305955938

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Chinese Part-of-Speech Tagging is a fundamental task in the field of Chinese information processing, and essential for the follow-up tasks such as syntactic parsing, chunk analysis and semantic analysis. The paper built a POS tagger based on MaxEnt and word clustering.The MaxEnt allows the mixture of diverse sources of information without necessarily assuming independence between the features, and is prone to get a relatively high baseline. First, have a tagging by Maximum Entropy model as a baseline. Secondly, clusters all the words in the corpus into 1024 clusters automatically. Then the word cluster will be added to the feature template, thus solve the problem of data sparseness to some extent. We try three kinds of clustering algorithm, including Maximum Mutual Information, Function Word based and High Frequency Word based, and have a comparison between them. Clustering is a kind of unsuperised learning, which makes it can employ great amount of unlabeld corpus, thus decrease the dependency on relatively expensive annotated corpus. According to our experiments, the method achieves an accuracy of 93.50% on 3M TCT training corpus which is released by CIPS-ParsEval-2009, and better than the previous method based on Maximum Entropy model solely.Our methods are expected to extend to other tasks of NLP area.

Keywords/Search Tags:

Part-of-speech tagging, Maximum Entropy, Word Clustering, Data Sparseness

PDF Full Text Request

Related items

1	Study Of Chinese POS Tagging Based On Maximum Entropy
2	Research On Laodian Participle And Part-of-speech Tagging Method
3	Research On Chinese Part-of-speech Tagging Based On Semi Hidden Markov Model
4	Chinese Word Found Its Part Of Speech Tagging
5	Research And Implementation On Part-Of-Speech Tagging In Automatic English Essay Scoring
6	Research On The Learning Of Integrating Chinese Word Segmentation With Part-of-Speech Tagging And Domain Adaption Approach
7	Research On The Construction Method Of Burmese Part-of-speech Tagging Corpus
8	Chinese POS Tagging Based On Maximum Entropy
9	Research On Chinese Word Segmentation And Part-of-speech Tagging Based On Deep Learning Methods
10	The Research Of Applying Conditional Random Fields To Chinese Word Segmentation And Part-Of-Speech Tagging