Font Size: a A A

Feature Analysis Of Relation Words In Chinese Compound Sentences Based On Tree Kernel Method

Posted on:2015-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q LuoFull Text:PDF
GTID:2268330428967674Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Research on Chinese information processing include the following aspects:word processing, word processing, sentence processing, text processing. Sentence processing includes research on a single sentence and complex sentences. Now, a lot of research has been done in word processing and has achieved good results; Chapters processing and sentence processing are now the focus of research. Many of the existing studies concerning sentence processing mainly focus on the study of a single sentence, less research work has been done on complex sentences. Complex sentences act as connection function between small sentence and discourse. Through the study of complex sentences, not only can we further deepen the study of the small sentence, but also we are able to promote the study of discourse processing forward. Therefore, the study of complex sentence appears to have urgency and necessity.Research on complex sentence contains the following sections:identification of clauses and non-clauses, automatic identification of the relation words of complex sentence, complex sentence-level division multiplexing identify sentence relations. Automatic identification of the relation words of complex sentence is one of the core content of the study of complex sentences.This paper presents feature analysis of the relation words of complex sentence basing on tree kernel, standing on the point of the automatic identification of the relation words of complex sentence. In the process of the automatic identification of combination relation words, when the relation words have the same identification result, its syntactic feature information has some similarities. Different strategies were selected to extract feature sequence from the complex sentence Syntax tree, applying the tree kernel to similarity calculation of feature sequence.Select the corresponding tree kernel in the calculation of similarity dealing with different feature sequence. SVMLight classification gives the identification result of relation words according to the result of similarity calculation.In the process of the relation words of complex sentence automatic identification, the complex sentence dependency tree’s shortest path tree and phrase structure syntax tree’s path tree which contains text information achieve the best identification results respectively. Apply the compound kernel to the similarity calculation of feature sequence, this compound kernel is made by the linear combination of linear kernel and convolution kernel. SVMLight classification obtained the highest identification accuracy according the similarity calculation of the compound kernel.Experimental results based on the analysis of feature sequence using tree kernel show that, feature sequence captured by the means of tree kernel can promote the process of the relation words automatic identification. Selecting corresponding kernel function for the similarity calculation as to different feature sequences which are gotten by using different strategies, the final results obtained are quite different. The differences prompt us to further study of feature selection strategy and the form of kernel function.
Keywords/Search Tags:the relation word of complex sentence, Tree Kernel method, Dependency grammar, Phrase structure grammar, Feature analysis
PDF Full Text Request
Related items