Font Size: a A A

Research On Treebank Construction And Application Of Chinese Dependency Parsing

Posted on:2020-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:L J GuoFull Text:PDF
GTID:2428330578480897Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Dependency parsing has made great progress with the development of deep learning.However,the performance of dependency parsing drops severely when training model with existing treebanks and testing with non-standard web texts.The main reason is that the existing treebanks are mainly from canonical texts,and give little consideration to veri-ous web texts.Therefore,we propose an annotation guideline of Chinese dependency syn-tax.Then,under the instruction of the guideline,we construct a large-scale Chinese de-pendency treebank for multi-domain and multi-source texts(especially web texts).Finally,we attempt to integrate the parsing results into sentence compression.Main researches in this paper are as follows:(1)Propose a new data annotation guideline of Chinese dependency syntaxThere still lacks a public,integrated and systematic annotation guideline for Chinese dependency treebank.Moreover,the existing works on Chinese dependency treebank do not give much consideration to special linguistic phenomena in web texts.Therefore,after making full reference to previous works and many linguistic works,we propose a new an-notation guideline(about 70 pages on the current version)for Chinese dependency treebank,which is suitable for multi-domain and multi-source texts.In addition,we carefully study-ing the problems encountered in real annotation practice,and propose clear priority strategy to deal with the difficulties during annotation in order to ensure the consistency ratio.We consider this annotation guideline as the theoretical basis for Chinese dependency treebank construction.(2)Active learning for Chinese dependency treebank constructionWe carry out large-scale data annotation,based on our recently designed annotation guideline and online annotating system.First,we select datas with active learning method.Then,we use visualiation system to perform programmatic annotation and quality control.Finally,we conduct detailed analysis to shed insights on the accuracy and consistency of annotated datasets,the distribution of dependency types and the performance comparison of parsing based on annotated datasets.(3)Chinese dependency parsing for sentence compressionIn order to apply Chinese dependency syntax information into sentence compression task.Three methods are used for sentence compression in two-domain texts.The first is the traditional method based on syntactic rules.The second is the Bi-directional Long Short Term Memory network-Conditional Random Field(BiLSTM-CRF)method integrated with syntax information.The last is the multi-task learning(MTL)method.In order to verify the validity of the method,this paper artificially constructed two Chinese data sets in different fields.Experimental results indicate that the BiLSTM-CRF method achieves the best performance in sentence compression compared to other methods.In summary,this paper proposes a new data annotation guideline for Chinese depen-dency Syntax.Then,a large-scale Chinese dependency treebank is constructed under this guideline.Finally,we improve the performance of sentence compression effectively by uti-lizing parsing results.Up to now,we have accomplished some primitive progress.We look forward to further motivating the progress of Parsing and other high-level applications of natural language processing.
Keywords/Search Tags:Dependency Parsing, Annotation Guideline, Treebank Construction, Sen-tence Compression, Long Short Term Memory Network
PDF Full Text Request
Related items