Font Size: a A A

Automatic Identification Of Chinese Coordination Structures

Posted on:2010-06-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y J MiaoFull Text:PDF
GTID:2178360275959248Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Conjunct identification is of significant importance to improve syntactic parsing performance as well as efficiency.It can also be directly applied in deep NLP applications, such as machine translation and information extraction.Previous research on conjunct identification mainly discussed the issue in theoretical perspective.A few of them focused on simple conjunct structures.As to improve the syntactic parsing performance,this paper takes advantage of Chinese TreeBank 5.1 as to address the issue of conjunct identification by using rule-based and statistics-based methods.This paper makes detail analysis on the conjunct-related linguistic features which is divided as internal and external features.Internal features are represented by the following items:POS sequence in conjunct,symmetric pattern in conjunct structure,and recursive conjunct structure.External features focus on the distribution of words who locate in left and right boundaries of conjunct structures.These statistical results provide abundant and useful information in identifying conjuncts.According to the similarity among conjuncts and the distribution of boundary words, heuristic rules are extracted automatically to identifying conjuncts.Particularly,this paper divides conjuncts into five types based on their headword's POStag,and handles each of them specially.This paper also identifies conjuncts with a maximum entropy(ME) model which treats the identification as a classification problem.The ME model is applied to figure out the left and right boundaries of a conjunct structure by using various contextual features.In order to overcome the problem caused by data sparseness,error-driven learning is proposed to optimize the outputs returned by ME model.It firstly collects candidate revising rules by comparing the manual conjuncts with the automatic counterpart.Then an evaluation function is aroused to pick up effective revising rules.Finally,the revising rules are adopted to rerank the outputs retum by ME model in the testing process.The experiments present that the rule-based method achieves F1 score of 75.6%,while the ME model gets a much higher performance of 83.7%in F1 score.Moreover,the error-driven learning further improves the F1 score up to 84.3%,which shows the effectiveness and robustness of our system.
Keywords/Search Tags:Coordinate structures, Conjuncts, rules, Maximum entropy model, Error-driven learning method
PDF Full Text Request
Related items