Research On Key Technologies Of Chinese Dependency Parsing

Posted on:2014-04-11

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Z H Li

Full Text:PDF

GTID:1268330392972597

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Dependency parsing aims to analyze the syntactic structure of a given sentence, andconverts the word sequence into a tree. A dependency consists of two words, and the oneword modifies the other one. The label of the dependency represents the specific relationbetween the two words, such as subject, object, adverbial modifier, and so on. Amongdiferent syntax formulizations, the dependency grammar has gained more and more in-terest in parsing community due to its characteristics:(1) representation simplicity,(2)easy to annotate,(3) easy to make use of. The international conference of ComputationalNatural Language Learning (CoNLL) has organized shared tasks on multilingual depen-dency parsing from2006to2009, which largely promotes the research on dependencyparsing. Meanwhile, dependency parsing has been more extensively applied to machinetranslation, question answering, text mining, information retrieval, and so on.Research on dependency parsing has two important goals. One goal is to improve theparsing accuracy, while the other is to improve the parsing efciency. Accurate parsingresults can provide reliable syntactic structures for higher-level applications. Along withthe quick growth of the web data, higher-level applications need to process a large amountof information in some limited time. Therefore, parsing efciency is also important. Thisthesis covers the two issues and consists of four parts.1. We propose a fast high-order dependency parsing method based on beam searchand punctuation. The previously proposed decoding algorithm for high-order dependencyparsing is based on dynamic programming and has high time complexity. To address thisissue, we propose a beam-search based decoding algorithm which on one hand allows themodel to incorporate rich high-order syntactic features, and on the other hand is able tofind the approximate optimal parse tree under lower time complexity. Our beam searchbased high-order dependency parser attended the CoNLL2009shared task on multilin-gual dependency parsing and semantic role labeling and achieved good results. To fur-ther improve the parsing efciency for long sentences, we analyze the characteristics ofChinese and propose to use punctuation to segment an input sentence into several subsen-tences and then apply two-stage dependency parsing. Experimental results show that thispunctuation-based two-stage parsing method can largely improve the parsing speed for long sentences. Meanwhile, the parsing accuracy on long sentences is also substantiallyincreased.2. We propose joint models for Chinese POS tagging and dependency parsing. Dueto little morphological changes, Chinese POS tagging accuracy is much lower than otherlanguages like English. This leads to severe error propagation for Chinese dependencyparsing. Our experiments show that parsing accuracy drops by about6%when replacingmanual POS tags of the input sentence with automatic ones generated by a state-of-the-art statistical POS tagger. To address this issue, this paper proposes a solution by jointlyoptimizing POS tagging and dependency parsing in a unique model.1) We propose forour joint models several dynamic programming based decoding algorithms by extendingthe decoding algorithms for dependency parsing.2) A novel and efective pruning strategybased on marginal probabilities is presented to reduce the search space of candidate POStags. Experimental results show that our joint models significantly improve both the state-of-the-art tagging and parsing accuracies. Detailed analysis shows that the joint methodcan help resolve syntax-sensitive POS ambiguities.3. We propose a separately passive-aggressive training algorithm for joint models.Joint models for POS tagging and dependency parsing are dominated by syntactic fea-tures. As a result, the POS features fails to fully contribute their disambiguation power.To solve this issue, we propose a separately passive-aggressive learning algorithm (SPA),which is designed to separately update the POS features weights and the syntactic featureweights with diferent update steps under the joint optimization framework. Comparedwith the traditional training algorithms averaged perceptron (AP) and passive aggressive(PA), SPA can naturally raise the weights of the POS features, and therefore better balancethe discriminative power of the POS and syntactic features of the joint models. Experi-mental results show that our joint models trained with SPA achieve the best tagging andparsing accuracy on both Chinese and English datasets.4. We propose a new multiple treebank exploitation method for dependency pars-ing with quasi-synchronous grammar (QG). There exist multiple treebanks of diferentannotation styles for Chinese, and it is attractive to exploit multiple treebanks to improvethe parsing accuracy. We present a simple and efective framework based on QG for ex-ploiting multiple monolingual treebanks with diferent annotation guidelines for parsing.Several types of transformation patterns (TP) are designed to capture the systematic an-notation inconsistencies among diferent treebanks. Based on such TPs, we design QG features to augment the baseline parsing models. The QG features can guide the parsingmodel to make better decisions, and they naturally fit into the decoding algorithms ofthe baseline graph-based parsing models. Experimental results show that our method canefectively exploit the knowledge of the source treebank, and significantly improve theparsing accuracy on the target treebank.In conclusion, based on the characteristics of Chinese, this thesis conducts thoroughstudy on fast high-order dependency parsing using punctuation, joint POS tagging anddependency parsing, and multiple treebank exploitation, and substantially improve the ef-ficiency and accuracy of dependency parsing on real-world texts. We have accomplishedseveral primitive achievements, which we hope can further motivate the progress of nat-ural language processing and other high-level applications like machine translation andinformation retrieval.

Keywords/Search Tags:

Dependency Parsing, Beam Search, Joint Models, Seperately Passive-aggressiveTraining Algorithm, Multiple Treebank Exploitation

PDF Full Text Request

Related items

1	Research On Treebank Construction And Application Of Chinese Dependency Parsing
2	Research On Treebank Conversion And Application Of Dependency Parsing
3	Research And Implementation On Dependency Parsing Based On Chinese Treebank
4	Vietnamese Dependent Tree Library Construction And Dependency Analysis Method Research
5	Active Learning For Chinese Dependency Treebank Building
6	Joint Models For Chinese Morphological Syntactic And Semantic Parsing
7	Exploiting Dependency Parsing As An Auxiliary Task To Enhance AMR Parsing
8	Multiple Feature-sets Algorithm Of Dependency Parsing
9	Research On Semi-supervised Domain Adaptation For Chinese Dependency Parsing
10	Semantic Dependency Analysis For Patent Text