On Dependency Parsing Optimization Techniques

Posted on:2016-12-15

Degree:Doctor

Type:Dissertation

Country:China

Candidate:J Ma

Full Text:PDF

GTID:1318330482454607

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

One of the fundamental tasks in Natural Language Processing is syntactic parsing, which gains both industrial and academic attention. Thanks to the structural simplicity, representation flexibility and ease of data annotation, dependency grammar-based parsing (dependency parsing) becomes more and more popular in NLP research. In addition, many downstream applications take dependency parsers as one of their core components. For example, many statistical machine translation systems employ dependency parsers to perform pre-reordering. Open information extraction systems also define templates upon dependency trees to extract relations between entities.With the rapid advance in machine learning techniques and computer hardware, performance of data driven dependency parsers significantly improved. In order to further advance parsing performance, we optimize parsers in several ways. There are three core components in a statistical parsing system:feature representation, decoding algorithm and training algorithm. In this thesis, we focus on feature and training optimization. The main contributions include:We propose a Part-Of-Speech(POS) tagging and dependency parsing joint model. POS is one of the most important features used for dependency disambiguation, and POS�accuracy greatly affects parsing performance. Traditional approaches treat POS tagging as a preprocessing step:before parsing, the input sentence is first automatically tagged by a POS tagger. In our method, we do tagging and parsing jointly. The advantage is that our joint model benefits from using partial dependency structure for POS disambiguation, so as to make better predictions in tagging, and in turn improve parsing accuracy.We propose a neural network based POS tagger. Parsing accuracy drops significantly when training data and testing data are from different domains. One of the reason is the low tagging accuracy on target domain. We adopt a two-phase model to deal with the problem in order to provide dependency parsers with high quality POS features on target domain. In the first phase, we use unlabeled target domain data to train an auto-encoder to capture regularities underlying target domain text. In the second phase, we integrate information from the trained auto-encoder into a neural network based POS tagger to improve target domain tagging accuracy. Finally, we conduct experiments to test effectiveness of our method.We propose a spurious ambiguity based global training approach. Spurious ambiguity in dependency parsing means that different action/transition sequences can lead to the same dependency structure. For parsers exhibit spurious ambiguity, such as transition based dependency parsers, one can extract multiple action sequences from an annotated sentence for training. However, to simplify training algorithms, traditional approaches adopt constraints to eliminate spurious ambiguity, and only the sequence which satisfy those constraints can be used for parser training. Unlike those approaches, our method allows all sequences to be involved in training, making full use of the Treebank so as to improve parsing accuracy.We propose an up-propagation based punctuation processing method. We first analyze the syntactic role of punctuation and effect of punctuation on parsing accuracy. Based on the analysis, we propose a method which only parse words while treat punctuations as properties of their neighboring words. Moreover, punctuation properties are propagated when constructing dependency arcs. On the one hand, our method avoids error propagation caused by punctuation parsing errors. On the other hand, our method enables parsers to make better use of punctuation features to make better predications.Our methods optimize parsing performance in different ways. In the future, we aim to investigate task-specific optimizing techniques so as to make dependency parsers more useful in real world applications.

Keywords/Search Tags:

Dependency Parsing, Part-Of-Speech Tagging, Joint Model, Neural Network, Training

PDF Full Text Request

Related items

1	Research On Dependency Parsing Of Tibetan Language Based On Deep Learning
2	Chinese Dependency Parsing Based On Deep Learning
3	Research On Technology Of Chinese Dependency Parsing
4	Research On Chinese Part-of-speech Tagging Based On Semi Hidden Markov Model
5	Improving Word Vector Model With Part-of-Speech And Dependency Grammar Information
6	BiLSTM And CNN Based Joint Model For Chinese Word Segmentation And Part-of-speech Tagging
7	Research On Sentiment Analysis Method For Network Texts
8	Research On Graph-based Chinese Dependency Parsing
9	Exploiting Dependency Parsing As An Auxiliary Task To Enhance AMR Parsing
10	Research On Part-of-speech Tagging For Chinese Electronic Medical Records