Font Size: a A A

Study On Cross-lingual Dependency Parsing

Posted on:2019-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:B XiaoFull Text:PDF
GTID:2428330545965540Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Dependency parsing is the core and basic technology of Natural Language Processing,in which dependency syntax tree is established by identifying the modifier and the modified relations between the words of sentence.At present,the accuracy of Chinese dependency parsing is not high enough for practical application.One reason is that Chinese parsing is more complicated compared with English and Japanese because of less surface information available for assisting parsing.Another reason is that the size of the annotated Chinese corpus is small and it is difficult to enlarge the corpus in the short time.We take English as the source language and Chinese as the target language for the case studies and for verification experiments.The paper aims at improving the accuracy of Chinese parsing by utilizing a large scale of annotated data and high precision parsers from English.The proposed method can be applied to other languages.The work and the contribution of the paper are summarized as follows.(1)Propose an augmented mapping algorithm for dependency relation projection from English to Chinese.An English-Chinese parallel corpus is utilized in dependency relation projection.The main problems of mapping are caused by the noise of automatic word alignment.The conventional mapping algorithm only considers the relations of which each word is aligned with only one target word and therefore the number of the projected results is limited.To resolve the noise problem,we extend the mapping rules.In this way,the uncertain alignment results are transformed into one to one alignments and the projected relations are increased without decreasing in mapping accuracy.We verify the augmented mapping algorithm on the open data set CTB5.The experimental results show that the augmented mapping algorithm obtained 3000 relations more than that obtained by the conventional mapping algorithm and the mapping accuracy was increased by 2%at the same time.(2)Design and implement a mapping based representation learning method to obtain a language independent feature matrix of English and Chinese bilingual words.The representation learning consists of three steps.The first step is to generalize the dependency relations using part of speech,the dependency direction,the dependent distance,and select total 515 language independent features for word representation.The second step is to construct a preliminary English-Chinese word matrix.The third step is to restore the missing values of the matrix for the Chinese words by using the matrix completion algorithm.Then the word feature matrix are used as augmented features for training a language independent parsing model on the English annotated data and the model is evaluated on the Chinese test set.The experimental results show that the language independent parsing model outperformed the model trained on a small scale of Chinese annotated data by 5%.(3)Propose a transfer learning based cross lingual parsing method.We design a shift-reduce parsing framework based on deep neural network,which consists of the input buffer based on bidirectional LSTM,the stack keeping the intermediate results,and the decision part based on a multilayer perceptron.After the English parsing model is trained on the annotated English data,the model is used as the initial parameter to train the Chinese parsing model on the annotated Chinese data.The experimental results show that the accuracy of the Chinese parsing model on the test set was improved by 2%,compared with the Chinese parsing model trained by random initialization.This paper proposed the above three cross lingual Chinese dependency parsing methods.The experimental results on the comparisons between the proposed methods and the conventional methods show that the accuracy of Chinese parsing was improved and therefore verified the effectiveness of the proposed methods.In the future,we will explore other deep neural network to deal with the problem of semantic gap between different languages for further improvement on cross lingual dependency parsing.
Keywords/Search Tags:Chinese dependency parsing, cross lingual, mapping algorithm, representation learning, transfer learning, deep neural network
PDF Full Text Request
Related items