Font Size: a A A

Vietnamese Dependent Tree Library Construction And Dependency Analysis Method Research

Posted on:2017-07-21Degree:MasterType:Thesis
Country:ChinaCandidate:F J LiFull Text:PDF
GTID:2358330488964837Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Vietnamese and China are closely linked, exchanges between the two peoples has a long history.Verbal communication in both sides to get along with people friendly exchanges, learn from each other played a very important role. Dependency tree corpus for machine translation, information retrieval and other upper-layer applications has a huge supporting role. Dependency parsing based on statistical machine learning methods of analysis is the current dependency parsing of the mainstream. Number and quality of the training data directly affect the effectiveness of analytical dependencies. In recent years, Chinese, English and other major languages of dependency parsing have achieved good results.In this paper, we respectively have done a thorough research in three aspects :construction of Vietnamese dependency tree corpus, Vietnamese dependency analysis and analysis of automatic error in Vietnamese Dependency Treebank:(1) Build Vietnamese Dependency Treebank by means of Chinese-Vietnamese bilingual corpus of word alignment methods. Few studies on Vietnamese, therefore has not built relatively large dependency Treebank. Compared to the rich and mature Chinese corpus, Vietnamese Syntactic Analysis is much more difficult. This paper presents an approach of Han-Vietnamese bilingual corpus of word alignment built Vietnamese Dependency Treebank method. Firstly, the aligned word processing was made by Chinese-Vietnamese sentence pairs; Secondly, the dependency parsing was done with Chinese sentences. Finally, Vietnamese Dependency Parsing Treebank was generated by Chinese-Vietnamese Languages align relationship and Chinese Dependency Tree. Experimental results show that this approach can simplify the process of manual collection and annotation of Vietnamese Treebank, also can save manpower and time building the Treebank. Experimental results show that the accuracy of this method compared to machine learning methods has improved significantly.(2) Building Vietnamese Dependency Treebank Using MST Algorithm and improved Nivre's Algorithm methods. We presents an approach of building Vietnamese Dependency Treebank by means of integrating MST parsing algorithm and improved Nivre's algorithm. This method is characterized by means of collaborative learning, firstly constructing a small sample; Secondly, constructing a weak learning system that has two fully redundant view, and mutually mark a large number of unlabeled samples; At last,choice high trust sample to relearn, building dependency parsing system. Our system achieves the accuracy of 76.33% using 10-fold cross-validation on the Vietnamese dependency treebank of artificial mark. Experimental results show that the proposed method improved the accuracy significantly by taking full advantage of unmarked corpus.(3) Rule-Based detection and analysis of annotation errors in dependency treebank. We try to transform dependency tree into phrase structure tree,and detect annotation errors automatically based on manual rules. This method has been used in processing the treebank that has been trained by Dependency Parsing System. Although the treebank has been manually corrected twice before processed by this method,1216 errors were detected among the 30000 sentences and the precision is 100%. The errors mainly belong to three types:word segmentation error, mismatching between POS and syntactic role, and syntactic role error. This method can further improve treebank quality, and be applied to other dependency treebanks.(4) After getting 30000 quality dependency tree corpus,build dependency parser system that based on traditional machine learning methods assimilating into vietnamese language features for for dependency analysis,and make the treebank graphical. At last, we build Vietnamese dependency treebank using MST algorithm and improved Nivre's algorithm.
Keywords/Search Tags:Dependency Treebank, Dependency Parsing System, MST Algorithm, Nivre's Algorithm
PDF Full Text Request
Related items