Font Size: a A A

On Dependency Parsing Optimization Techniques

Posted on:2016-12-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:J MaFull Text:PDF
GTID:1318330482454607Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
One of the fundamental tasks in Natural Language Processing is syntactic parsing, which gains both industrial and academic attention. Thanks to the structural simplicity, representation flexibility and ease of data annotation, dependency grammar-based parsing (dependency parsing) becomes more and more popular in NLP research. In addition, many downstream applications take dependency parsers as one of their core components. For example, many statistical machine translation systems employ dependency parsers to perform pre-reordering. Open information extraction systems also define templates upon dependency trees to extract relations between entities.With the rapid advance in machine learning techniques and computer hardware, performance of data driven dependency parsers significantly improved. In order to further advance parsing performance, we optimize parsers in several ways. There are three core components in a statistical parsing system:feature representation, decoding algorithm and training algorithm. In this thesis, we focus on feature and training optimization. The main contributions include:We propose a Part-Of-Speech(POS) tagging and dependency parsing joint model. POS is one of the most important features used for dependency disambiguation, and POS·accuracy greatly affects parsing performance. Traditional approaches treat POS tagging as a preprocessing step:before parsing, the input sentence is first automatically tagged by a POS tagger. In our method, we do tagging and parsing jointly. The advantage is that our joint model benefits from using partial dependency structure for POS disambiguation, so as to make better predictions in tagging, and in turn improve parsing accuracy.We propose a neural network based POS tagger. Parsing accuracy drops significantly when training data and testing data are from different domains. One of the reason is the low tagging accuracy on target domain. We adopt a two-phase model to deal with the problem in order to provide dependency parsers with high quality POS features on target domain. In the first phase, we use unlabeled target domain data to train an auto-encoder to capture regularities underlying target domain text. In the second phase, we integrate information from the trained auto-encoder into a neural network based POS tagger to improve target domain tagging accuracy. Finally, we conduct experiments to test effectiveness of our method.We propose a spurious ambiguity based global training approach. Spurious ambiguity in dependency parsing means that different action/transition sequences can lead to the same dependency structure. For parsers exhibit spurious ambiguity, such as transition based dependency parsers, one can extract multiple action sequences from an annotated sentence for training. However, to simplify training algorithms, traditional approaches adopt constraints to eliminate spurious ambiguity, and only the sequence which satisfy those constraints can be used for parser training. Unlike those approaches, our method allows all sequences to be involved in training, making full use of the Treebank so as to improve parsing accuracy.We propose an up-propagation based punctuation processing method. We first analyze the syntactic role of punctuation and effect of punctuation on parsing accuracy. Based on the analysis, we propose a method which only parse words while treat punctuations as properties of their neighboring words. Moreover, punctuation properties are propagated when constructing dependency arcs. On the one hand, our method avoids error propagation caused by punctuation parsing errors. On the other hand, our method enables parsers to make better use of punctuation features to make better predications.Our methods optimize parsing performance in different ways. In the future, we aim to investigate task-specific optimizing techniques so as to make dependency parsers more useful in real world applications.
Keywords/Search Tags:Dependency Parsing, Part-Of-Speech Tagging, Joint Model, Neural Network, Training
PDF Full Text Request
Related items