Syntactic analysis is the crucial section of the machine translation.One of method in syntactic analysis is dependency analysis that researching the dependency relationships between different words in the sentence,dependency relationship between two words could definitely indicate which word is more important than the other word in certain relationship, and easily turn into semantic dependency description;Penn Chinese Treebank(CTB) is public large-scale corpus which is phrase structure for researchers home and abroad,but CTB is the corpus which is phrase structure,and it does not label the head child node in every phrase structure.So CTB should be transformed into dependency structure corpus,and then for Chinese dependency structure analysis.This paper firstly summarizes the Chinese HeadRules table through analyzing the phrase structure in large-scale corpus and special features in Chinese grammar,and we transform from phrase structure to dependency structure CTB corpus which is phrase structure initially using the Chinese HeadRules table,in order to providing CTB which is dependencey structure for testing and analyzing in follow-up experiments.In this part,the corpse for experiment is Penn Chinese Treebank 5.0,and after the transformation from phrase structure to dependency structure using Chinese HeadRules table,we randomly extract 200 sentences,and label dependency relationship artificially,test result is 99.95%.Then,we adopt deterministic Nivre's algorithm with consideration of long-distance dependency and deterministic Nivre's algorithm based on Root Node,test and analyze dependency relationship using new corpus,the accuracy of dependency analysis are 65.43% and 74.35%.Deterministic Nivre algorithm with consideration of long-distance dependency, based on the original Nivre algorithm,with characteristics of Chinese grammar,considering the dependency between two long-distance words;In deterministic Nivre's algorithm based on Root Node,firstly dividing the original sentence into two relative simple clauses,and then for each clause to consider using deterministic Nivre's algorithm with consideration of long-distance dependency,that method not only reducing the resolving difficulty and avoiding considering the dependencies across the root in sentence.Finally,we analyze the effect between dependency relationship analysis and the size of vocabulary,find that when the vocabulary is made up of full words,the dependency relationship analysis accuracy is not the best,however when the size of vocabulary is 9000 words,the accuracy is the best one,and as the size of vocabulary is growing,the accuracy goes down,and the cost of dependency analysis is great. |