Font Size: a A A

Research On Chinese Dependency Parsing With High-Performance

Posted on:2018-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:T LuoFull Text:PDF
GTID:2348330512480164Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Dependency Parsing is one of the key technologies in Natural Language Processing(NLP),which is used to analyze the syntactic structure of sentence and construct a dependency tree in accordance with the dependency grammar.Dependency parsing is widely used in the task of semantic role labeling and machine translation because it reflects the modify relation between words.The accuracy of Chinese dependency parsing is limited by the scale and quality of artificial annotation treebank.There is almost no large-scale Chinese dependency treebank,but large-scale unannotated data is relatively easy to obtain.Some researchers proposed the model training methods of using large-scale unannotated data and this method inevitably introduce too much noise from automatic annotation.On the other hand,after analyzing and classifying the errors of Chinese dependency parsing,we find that the main dependency errors is related to the verb.In this paper,the following work has been done to improve the accuracy of Chinese dependency parsing.(1)We propose an approach of iteratively integrating unsupervised features for training Chinese dependency parsing model.Considering that more errors occurred in parsing longer sentences,this paper divide raw data according to sentence length and then iteratively train model.The model trained on shorter sentences will be used in the next iteration to analyze longer sentences.This paper adopts a character-based dependency model for joint word segmentation,POS tagging and dependency parsing in Chinese.The advantage of the joint model is that one task can be promoted by other tasks during processing by exploring the available internal results from the other tasks.The higher accuracy of the three tasks on shorter sentences can bring about higher accuracy of the whole model.This paper verified the proposed approach on the Penn Chinese Treebank and two raw corpora.The experimental results show that F1-scores of the three tasks were improved at each iteration,and FI-score of' the dependency parsing was increased by 0.33%,compared with the conventional method.(2)We explore the Chinese dependency parsing method based on the automatic construction of large-scale case frame.Case frame is a modification of the relationship betxween verb and other elements in a sentence,and the main dependency errors is related to the verb,so Chinese case frame can be targeted to solve the main errors in Chinese dependency parsing.In view of the importance of case frame and there is no Chinese case frame until now,this paper propose an approach of translating the Japanese case frames into Chinese by some dictionary resource,such as Japanese-Chinese dictionary and Traditional-Simplified dictionary.Because there are some one-to-more translations,this paper propose an approach of selecting the best translation for words by using similarity algorithm based on monolingual corpus.The experimental results show that there are 25285 Japanese verbs that can be translated and there are 124781 Japanese nouns that can be translated,and each verb has 25.3 modified noun averagely.It is of great significance to solve the problem of Chinese dependency parsing errors.
Keywords/Search Tags:Chinese Dependency Parsing, Iterative, Unsupervised Feature, Joint Model, Case Frame
PDF Full Text Request
Related items