Font Size: a A A

Research On Treebank Conversion And Application Of Dependency Parsing

Posted on:2021-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:B ZhangFull Text:PDF
GTID:2428330605974765Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As a key technology in natural language processing(NLP),dependency parsing aims to convert a word sequence into a tree structure and uses directed edges as the basic unit to describe the modification relationship between words.At present,the academic com-munity mainly focuses on improving parsing accuracy by enhancing parsing models and algorithms.With the development of deep learning,neural network based dependency pars-ing has achieved significant progress.From the perspective of data usage,this thesis tries to utilize multiple heterogeneous treebank to improve parsing accuracy,in order to provide more reliable syntactic structures for downstream NLP tasks.Furthermore,we obtain dif-ferent forms of syntactic information from a state-of-the-art parser,and then leverage such syntactic information for the task of opinion role labeling.Specifically,this thesis has mainly conducted the following studies.(1)Treebank Conversion Based on Pattern Embedding and SP-TreeLSTMAs a method for utilizing multiple heterogeneous data,treebank conversion can directly and effectively utilize linguistic knowledge contained in heterogeneous treebanks to boost the performance of target-side parsing.We for the first time propose the task of supervised treebank conversion.First,we manually construct a bi-tree aligned dataset containing about 11K sentences.Then,we propose two simple yet effective treebank conversion approaches based on the state-of-the-art deep biaffine parser.Finally,we convert the source-side tree-bank into the target-side treebank by a well-trained conversion model and expand the scale of the target-side treebank,thus boosting parsing accuracy at the target side.Experimen-tal results show that treebank conversion is superior to the widely used multi-task learning(MTL)framework in exploiting multiple heterogeneous treebanks and leads to significantly higher parsing accuracy.(2)Treebank Conversion and Exploitation Based on Full-TreeLSTMThere are two main challenges for treebank exploitation via treebank conversion.One is how to convert the source-side tree to the target-side tree with high quality(treebank con-version),and the other is how to effectively exploit the converted treebank for higher pars-ing accuracy of target side(treebank exploitation).Based on the second chapter,we try to improve the methods of treebank conversion and exploitation.In terms of treebank con-version,we for the first time propose the conversion method based on the Full-TreeLSTM to deeply and efficiently encode the source-side tree.In terms of treebank exploitation,the corpus weighting and concatenation with fine-tuning approaches are introduced to weaken the noise contained in the converted treebank.Experimental results on two benchmarks of bi-tree aligned data show that 1)compared with pattern embedding and SP-TreeLSTM ap-proaches,the proposed Full-TreeLSTM approach is more fast and effective;2)the corpus weighting and concatenation with fine-tuning approaches can both effectively exploit the converted treebank,which further improve the performance of target-side parsing.(3)Syntax-Enhanced Opinion Role LabelingOpinion role labeling(ORL)is a fine-grained opinion analysis task and aims to answer"who expressed what kind of sentiment towards what",which has a wide range of real-world applications.Due to the small scale of labeled data,ORL remains challenging for data-driven methods.We alleviate the scarcity of labeled data by introducing the informa-tion of dependency parsing.Firstly,we extract three forms of syntactic information from the state-of-the-art parser.Then,we investigate and compare different encoding methods to represent the syntactic information,and incorporate them into the ORL model in a pipeline way.Finally,in order to reduce the error propagation problem caused by the pipeline way,we introduce a novel MTL framework to train the parser and ORL simultaneously.We verify our methods on the benchmark MPQ A corpus and experimental results show that 1)syntactic information is highly valuable and significantly strengthen the recognition ability of ORL;2)the soft-parameter-sharing MTL framework effectively alleviates error propagation and further improves the performance of ORL.In addition,we confirm that the contribution from dependency parsing does not fully overlap with the popular contextualized word represen-tations(BERT),and our best model outperform the current state of the art by 4.34%in F1 score.In summary,this thesis presents an in-depth study on conversion and exploitation of multiple heterogeneous dependency treebanks,and then applies dependency parsing outputs to the ORL task.We hope that our preliminary progress will contribute to the development of dependency parsing and other tasks in the field of natural language processing.
Keywords/Search Tags:Dependency Parsing, Bi-tree Aligned Dataset, Supervised Treebank Conversion, Opinion Role Labeling, Multi-task Learning
PDF Full Text Request
Related items