Font Size: a A A

Research On Vietnamese Sentence Analysis And Tree Library Transformation Method

Posted on:2018-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2358330518461951Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Natural Language Processing is a very important part of artificial intelligence research.The rapid development of artificial intelligence has also contributed to the research of Natural Language Processing.Parsing is one of the key technologies in Natural Language Processing,and its basic task is to determine the syntactic structure of sentences.Syntax analysis can not only provide services for Natural Language Processing's upper layer applications,but also provide help for Natural Language Processing's underlying processing.At present,the syntactic analysis of Chinese English is relatively mature,but there are few researches on the syntactic analysis of Vietnamese.The demand of machine translation between multi-languages has increased.At the same time,syntactic parsing has become one of the important factors that restrict the development of it.The aim of Vietnamese syntactic parsing and Treebank conversion is to construct a scale Vietnamese phrase tree and dependency tree,while enhancing the Vietnamese syntactic parsing accuracy and efficiency.This paper will analyze from three aspects,which include the method of Vietnamese syntactic analysis,Vietnamese Treebank conversion,and Vietnamese complex long sentences syntactic parsing.(1)This paper presents a method to construct the Vietnamese phrase Treebank by fusion of Vietnamese grammatical features and improved PCFG model.Phrase Treebank is an important resource for Natural Language Processing research and practical application.For Vietnamese,we lack this kind of Treebank resources.This method can automatically analyze Vietnamese phrase structure tree,and it can solve the problem of constructing the Vietnamese phrase Treebank.Firstly,Vietnamese grammatical feature set is established by analysis of Vietnamese grammatical features.Then,grammar rule set of PCFG model is obtained from manual annotation Vietnamese phrase trees.Finally,Vietnamese grammatical feature set is fused into improved PCFG model,which is regarded as a supplement.The new method completes the construction of Vietnamese phrase Treebank.The experimental results show that the accuracy of proposed PCFG model for the Vietnamese phrase Treebank construction reaches 81.14%.Compared with conventional PCFG model and the maximum entropy method,the accuracy is obviously improved.(2)Syntactic analysis plays a very important role in Natural Language Processing.Currently,the existing syntactic analysis has largely ignored the important place that punctuation plays in syntactic analysis,so this is also for Vietnamese.Firstly,according to the sentence structure characteristics of punctuation proposed the concept of hierarchical rules;then the punctuation of hierarchical rule of secondary analysis method is given based on punctuation's sentence specific features and the position relations;finally,punctuation is integrated into Vietnamese longsentences syntactic analysis.The experimental data used in this paper were from Pennsylvania Treebank in Vietnamese phrase tree.The contrast experiments carried out on syntactic analysis of long Vietnamese sentences.The precision rate and recall rate of long Vietnamese sentences syntactic analysis increased by 2 to 3 percentage points,and the time overhead reduces the nearly 1/3.The experimental results show that the punctuation of Vietnamese sentences syntactic parsing is very favorable,and the system performance has been greatly improved.(3)Dependency parsing is a key part of the natural language processing.Currently,there is some research on Vietnamese phrase structure trees,but few on dependency structure treebank.A novel method is proposed in this paper,which combines the Vietnamese language features and grammatical features,uses the head percolation table as well as statistical machining learning method to convert Vietnamese phrase structure treebank into a dependency one.Firstly,according to Chinese dependency annotation system and Vietnamese grammar rules,a list of dependencies were developed;Secondly,integrating the characteristics of Vietnamese language,the head percolation table was worked out;Thirdly,using the head percolation table to carry out preliminary conversion;finally using dependency tagger to tag dependency.Vietnamese dependency structure treebank increases by training converted Treebank with MSTParser tool.Precision of the conversion reaches 89.4%.Experimental results show that the proposed method gives a better solution of converting constituent-to-dependency Treebank for Vietnamese.
Keywords/Search Tags:Syntax analysis, phrase tree, dependency tree, probabilistic context free grammar, the head percolation table, dependency labeling
PDF Full Text Request
Related items