Automatic Identification Of Chinese Coordination Structures

Posted on:2010-06-05

Degree:Master

Type:Thesis

Country:China

Candidate:Y J Miao

Full Text:PDF

GTID:2178360275959248

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Conjunct identification is of significant importance to improve syntactic parsing performance as well as efficiency.It can also be directly applied in deep NLP applications, such as machine translation and information extraction.Previous research on conjunct identification mainly discussed the issue in theoretical perspective.A few of them focused on simple conjunct structures.As to improve the syntactic parsing performance,this paper takes advantage of Chinese TreeBank 5.1 as to address the issue of conjunct identification by using rule-based and statistics-based methods.This paper makes detail analysis on the conjunct-related linguistic features which is divided as internal and external features.Internal features are represented by the following items:POS sequence in conjunct,symmetric pattern in conjunct structure,and recursive conjunct structure.External features focus on the distribution of words who locate in left and right boundaries of conjunct structures.These statistical results provide abundant and useful information in identifying conjuncts.According to the similarity among conjuncts and the distribution of boundary words, heuristic rules are extracted automatically to identifying conjuncts.Particularly,this paper divides conjuncts into five types based on their headword's POStag,and handles each of them specially.This paper also identifies conjuncts with a maximum entropy(ME) model which treats the identification as a classification problem.The ME model is applied to figure out the left and right boundaries of a conjunct structure by using various contextual features.In order to overcome the problem caused by data sparseness,error-driven learning is proposed to optimize the outputs returned by ME model.It firstly collects candidate revising rules by comparing the manual conjuncts with the automatic counterpart.Then an evaluation function is aroused to pick up effective revising rules.Finally,the revising rules are adopted to rerank the outputs retum by ME model in the testing process.The experiments present that the rule-based method achieves F1 score of 75.6%,while the ME model gets a much higher performance of 83.7%in F1 score.Moreover,the error-driven learning further improves the F1 score up to 84.3%,which shows the effectiveness and robustness of our system.

Keywords/Search Tags:

Coordinate structures, Conjuncts, rules, Maximum entropy model, Error-driven learning method

PDF Full Text Request

Related items

1	A Research On Learning Weights Of Fuzzy Production Rules Based On Maximum Fuzzy Entropy
2	Multiple model control and maximum entropy control of flexible structures: Implementation and evaluation
3	Maximum Entropy Method And Its Applications In Natrual Language Processing
4	Transfer Learning Algorithms Based On Maximum Entropy Model
5	A Study Of The Shallow Syntactic Analysis Methods In Vietnamese
6	The Reseach And Application Of Stochastic Gradient Descent And Dual Coordinate Descent Algorithm
7	Regularized Maximum Entropy Imitation Learning Based On Prior Reward Of Trajectory
8	An Adaptive BLP Access Control Model Based On Maximum Entropy
9	Research On Inverse Reinforcement Learning Based On Maximum Entropy Theory
10	The Application Of Information Entropy In Machine Learning Algorithm