Font Size: a A A

Research On Graph-based Chinese Dependency Parsing

Posted on:2022-08-08Degree:MasterType:Thesis
Country:ChinaCandidate:X C LiFull Text:PDF
GTID:2518306563476034Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Dependency parsing is a process of identifying semantic modification relationships between words in a sentence and constructing dependency syntax trees.Dependency syntax trees are widely used in natural language processing tasks such as machine translation and question answering system,which can express the syntactic structure information of sentences succinctly and efficiently.In the dependency parsing of Chinese,word segmentation and part-of-speech(POS)tagging should be carried out first.In order to solve the problem of error propagation and inability to share features in this serial analysis method,researchers proposed a scheme of combining three tasks for simultaneous analysis.How to improve the analysis accuracy of the three tasks at the same time has always been the goal of Chinese dependency parsing research.The two main methods of dependency parsing are transition-based method and graph-based method.Transition-based dependency parsing has always maintained the highest accuracy,but with the development of computing power and deep learning technology,graph-based parsing has been able to take full advantage of the global decision making and parallelization processing,and has surpassed the accuracy of transition-based model in English dependency parsing.This paper focuses on the graph-based Chinese dependency parsing method,focusing on the graph-based joint model of Chinese word segmentation,POS tagging and dependency parsing,as well as the use of second-order subtree in the graph-based model.Previous work has only combined word segmentation and dependency parsing.How to combine POS tagging task has become an urgent problem to be solved in the joint model of Chinese dependency parsing.On the other hand,existing works only attempted to use the first-order subtree,and how to use the second-order subtree to improve the accuracy of joint model,has become an urgent problem to be solved in graph-based analysis methods.In view of the above two problems,this paper proposes a graph-based joint model of three tasks,designs and implements a joint model of Chinese Dependency that fuses second-order subtree,and conducts experiment and evaluation on public data set CTB5.The main research contents and contributions of this thesis include the following two aspects.(1)This thesis proposes a graph-based joint model of Chinese word segmentation,POS tagging and dependency parsing.The POS tagging task cannot be directly combined with the dependency parsing task by transforming into the inter-word dependency relationship as the word segmentation task.In order to combine POS tagging task,we considered transforming it into character-level sequence tagging task,and joint it with character-level dependency parsing by using multi-task learning method.In order to achieve this goal,two kinds of joint methods are designed in this paper.One is the implicit method,which designs the shared encoding layer to realize the joint of character-level POS tagging task and character-level dependency parsing task.Another is the explicit method,which designs the tag attention mechanism to integrate the vectorized POS tags into the character-level dependency parsing task,so as to achieve joint model.In this paper,we conducted experiments on Chinese Treebank 5.1(CTB5),and the experimental results show that in unlabeled dependency parsing,implicit and explicit joint models increase F1 score by 0.24% and 0.22% respectively.It shows that the two methods proposed in this paper can effectively improve the accuracy of dependency parsing.Compared with latest transition-based model,the F1 score of POS tagging and unlabeled dependency parsing are improved by 0.80% and 6.49% respectively for the implicit joint model,and 1.01% and 6.47% respectively for the explicit joint model,indicating that the graph-based joint model of three tasks proposed in this paper has more advantages than the transition-based joint model.(2)This thesis proposes a graph-based second-order Chinese word segmentation,POS tagging and dependency parsing joint model.The existing working assumption is that the dependence arcs are independent of each other and only the first-order subtree is used for prediction.We consider that there is a correlation between adjacent arcs in the same direction or between arcs connected head to tail,and the corresponding secondorder models of adjacent-sibling second-order subtree and grandparent second-order subtree are designed respectively.The evaluation results on the same dataset show that the F1 score of the two second-order models on the three tasks of word segmentation,POS tagging and dependency parsing are improved.The model using the adjacent-sibling second-order subtree increases by 0.11%,0.33% and 0.24%,respectively,while the model using the grandparent second-order subtree increases by 0.16%,0.37% and 0.26%,respectively.It shows that the second-order model constructed in this paper can effectively improve the analysis accuracy of the three tasks in the joint model.
Keywords/Search Tags:Chinese word segmentation, POS tagging, Dependency parsing, Attention mechanism, Joint learning
PDF Full Text Request
Related items