Font Size: a A A

Research On Key Technologies Of Chinese Dependency Parsing

Posted on:2014-04-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z H LiFull Text:PDF
GTID:1268330392972597Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Dependency parsing aims to analyze the syntactic structure of a given sentence, andconverts the word sequence into a tree. A dependency consists of two words, and the oneword modifies the other one. The label of the dependency represents the specific relationbetween the two words, such as subject, object, adverbial modifier, and so on. Amongdiferent syntax formulizations, the dependency grammar has gained more and more in-terest in parsing community due to its characteristics:(1) representation simplicity,(2)easy to annotate,(3) easy to make use of. The international conference of ComputationalNatural Language Learning (CoNLL) has organized shared tasks on multilingual depen-dency parsing from2006to2009, which largely promotes the research on dependencyparsing. Meanwhile, dependency parsing has been more extensively applied to machinetranslation, question answering, text mining, information retrieval, and so on.Research on dependency parsing has two important goals. One goal is to improve theparsing accuracy, while the other is to improve the parsing efciency. Accurate parsingresults can provide reliable syntactic structures for higher-level applications. Along withthe quick growth of the web data, higher-level applications need to process a large amountof information in some limited time. Therefore, parsing efciency is also important. Thisthesis covers the two issues and consists of four parts.1. We propose a fast high-order dependency parsing method based on beam searchand punctuation. The previously proposed decoding algorithm for high-order dependencyparsing is based on dynamic programming and has high time complexity. To address thisissue, we propose a beam-search based decoding algorithm which on one hand allows themodel to incorporate rich high-order syntactic features, and on the other hand is able tofind the approximate optimal parse tree under lower time complexity. Our beam searchbased high-order dependency parser attended the CoNLL2009shared task on multilin-gual dependency parsing and semantic role labeling and achieved good results. To fur-ther improve the parsing efciency for long sentences, we analyze the characteristics ofChinese and propose to use punctuation to segment an input sentence into several subsen-tences and then apply two-stage dependency parsing. Experimental results show that thispunctuation-based two-stage parsing method can largely improve the parsing speed for long sentences. Meanwhile, the parsing accuracy on long sentences is also substantiallyincreased.2. We propose joint models for Chinese POS tagging and dependency parsing. Dueto little morphological changes, Chinese POS tagging accuracy is much lower than otherlanguages like English. This leads to severe error propagation for Chinese dependencyparsing. Our experiments show that parsing accuracy drops by about6%when replacingmanual POS tags of the input sentence with automatic ones generated by a state-of-the-art statistical POS tagger. To address this issue, this paper proposes a solution by jointlyoptimizing POS tagging and dependency parsing in a unique model.1) We propose forour joint models several dynamic programming based decoding algorithms by extendingthe decoding algorithms for dependency parsing.2) A novel and efective pruning strategybased on marginal probabilities is presented to reduce the search space of candidate POStags. Experimental results show that our joint models significantly improve both the state-of-the-art tagging and parsing accuracies. Detailed analysis shows that the joint methodcan help resolve syntax-sensitive POS ambiguities.3. We propose a separately passive-aggressive training algorithm for joint models.Joint models for POS tagging and dependency parsing are dominated by syntactic fea-tures. As a result, the POS features fails to fully contribute their disambiguation power.To solve this issue, we propose a separately passive-aggressive learning algorithm (SPA),which is designed to separately update the POS features weights and the syntactic featureweights with diferent update steps under the joint optimization framework. Comparedwith the traditional training algorithms averaged perceptron (AP) and passive aggressive(PA), SPA can naturally raise the weights of the POS features, and therefore better balancethe discriminative power of the POS and syntactic features of the joint models. Experi-mental results show that our joint models trained with SPA achieve the best tagging andparsing accuracy on both Chinese and English datasets.4. We propose a new multiple treebank exploitation method for dependency pars-ing with quasi-synchronous grammar (QG). There exist multiple treebanks of diferentannotation styles for Chinese, and it is attractive to exploit multiple treebanks to improvethe parsing accuracy. We present a simple and efective framework based on QG for ex-ploiting multiple monolingual treebanks with diferent annotation guidelines for parsing.Several types of transformation patterns (TP) are designed to capture the systematic an-notation inconsistencies among diferent treebanks. Based on such TPs, we design QG features to augment the baseline parsing models. The QG features can guide the parsingmodel to make better decisions, and they naturally fit into the decoding algorithms ofthe baseline graph-based parsing models. Experimental results show that our method canefectively exploit the knowledge of the source treebank, and significantly improve theparsing accuracy on the target treebank.In conclusion, based on the characteristics of Chinese, this thesis conducts thoroughstudy on fast high-order dependency parsing using punctuation, joint POS tagging anddependency parsing, and multiple treebank exploitation, and substantially improve the ef-ficiency and accuracy of dependency parsing on real-world texts. We have accomplishedseveral primitive achievements, which we hope can further motivate the progress of nat-ural language processing and other high-level applications like machine translation andinformation retrieval.
Keywords/Search Tags:Dependency Parsing, Beam Search, Joint Models, Seperately Passive-aggressiveTraining Algorithm, Multiple Treebank Exploitation
PDF Full Text Request
Related items