Font Size: a A A

Chinese Dependency Parsing Based On Deep Learning

Posted on:2020-06-12Degree:MasterType:Thesis
Country:ChinaCandidate:H LiuFull Text:PDF
GTID:2428330578957247Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Dependency parsing is a key basic technology in natural language processing.Its goal is to identify the modifier-modificand relation between words in a sentence and establish corresponding dependency tree according to dependency grammar.Since dependency tree is a concise and efficient way for machine to express the understanding of natural language,dependency parsing is widely applied in natural language processing.Compared with English and Japanese,the performance of Chinese dependency parsing is not high for application owing to two aspects.First,there is no word boundary and errors frequently occur in word segmentation because of high ambiguity in Chinese character.Second,the lack of surface information makes part-of-speech and dependency parsing more difficult.In Chinese language processing,word segmentation and part-of-speech tagging are required prior to dependency parsing,and the errors from the two tasks obviously affect the accuracy of dependency parsing.To solve the error propagation problem,the joint model of Chinese word segmentation,part-of-speech tagging and dependency parsing has been proposed.And,the researches mainly focus on how to make use of the intermediate results of the three tasks as features in parsing procedure,for improving Chinese dependency parsing performance.This thesis describes studies on Chinese dependency parsing based on deep learning.We use the advantage of deep learning to realize the transition-based joint model of Chinese word segmentation,part-of-speech tagging and dependency parsing.The main work and contributions are summarized as follows.(1)We propose an encoding method for representing dependency subtree.The dependency subtrees built by the joint model in the parsing procedure are complex and diverse,which makes the use of subtree features more difficult.Conventional methods of feature engineering-based and neural network-based only extract the partial nodes at the top of the stack as features for predicting actions,but not all the information in the stack.To solve this problem,we propose a Stack-Tree LSTM method for encoding dependency subtrees by combining the stack structure with the neural network,by which all dependency subtrees can be utilized as features without tedious feature engineering.The experimental results show that the proposed method improves the performance of dependency parsing in long sentence and long dependency,and the F1 scores of accuracy in three tasks achieved 97.78%,93.51%and 79.66%respectively,outperforming the existing neural network-based joint models(2)We propose the position-based Chinese character encoding method.As the smallest semantic unit of Chinese,Chinese characters have high ambiguity and the meanings are dependent on the context in which they appear.Therefore,it is unreasonable to represent the meaning of each Chinese character by sole and independent embedding.In this paper,the position of the character in word is used as predicator of semantics of Chinese characters,and then four types of embedding are designed to represent each Chinese character by adopting(B/M/E/S)position labeling method.Then the attention mechanism is introduced to guide the weight distribution learning and an embedding of a character in a given sentence is calculated according to the context.The experimental results show that this method effectively improved the performance of Chinese word segmentation(+0.3%)and dependency parsing(+0.59%).(3)We propose the Encoder-Decoder architecture for Chinese dependency parsing.In the existing joint models,the decision layer only considers the current state at each time step,without utilizing history information and the dependent relations between actions.For solving the problems,we propose the Encoder-Decoder architecture for Chinese dependency parsing.In the encoder,the semantic of Chinese character and global context information of the input sentence are accurately represented by the position-based Chinese character vector and bidirectional LSTM.In the decoder,we design a feature function to capture the n-gram feature,part of speech feature and dependency subtree feature at each step of decoding procedure.Then we employ the sequential LSTM to track history information and provide richer features for predicting actions at each time step.The experimental results show that the Fi scores of Chinese word segmentation,part-of-speech tagging and dependency parsing achieved 97.88%,93.82%and 80.47%,respectively.Compared with the existing neural network-based methods,the Fi scores were increased by 0.16%,0.70%and 1.44%respectively.In summary,we propose the above Chinese dependency parsing methods based on deep learning and implement experiments on the Penn Chinese Treebank to demonstrate the effectiveness of each component of the proposed method.Compared with the existing joint models,the effectiveness of the proposed method is fully verified and our methods outperformed the neural network-based joint model on three tasks,and performed excellent on dependency parsing.
Keywords/Search Tags:Deep learning, Chinese dependency parsing, Dependency subtree, Chinese character embedding, Word segmentation,part-of-speech tagging, Encoder-Decoder
PDF Full Text Request
Related items