Font Size: a A A

Hierarchy Division Of A Compound Sentences With Non-saturated Relation Word Via Neural Network

Posted on:2020-07-20Degree:MasterType:Thesis
Country:ChinaCandidate:L L YangFull Text:PDF
GTID:2518305762978949Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Compound sentence is an important entity of Chinese grammar,which contains two or more clauses.Among these relationships of clauses,the hierarchical structure and logical semantics are relatively complex.Correct classification of the hierarchical relations of Chinese compound sentences is not only of great significance to automatic question answering and machine translation,but also conducive to the development of text comprehension.Relation mark is a word in a compound sentence,which is used to connect clauses and indicate their relations.Because of the partial default of relation marks,the hierarchical structure and logical semantic relation of the compound sentences can not be explicitly identified,which makes it difficult to divide the hierarchical relation of the non-saturated compound sentences with relative words.This paper describes a study of non-saturated compound sentences with three clauses,using the method of deep learning to automatically identify the hierarchical attribution of the non-saturated compound sentences with relative words.The work done is as follows:First of all,this paper uses punctuation and dependency syntax to make a preliminary division of the clauses of compound sentences,and then uses the constructed "independent language" rule base to filter the pseudo clauses and realize the accurate division of the clauses of compound sentences.Secondly,the feature extraction of clauses in compound sentences is carried out in three aspects.First,we construct the syntactic analysis tree of clauses and go through it with the Depth-First-Search algorithm to extract the syntactic components of clauses and calculate the syntactic similarity among clauses.Afterwards,we extract the core argument of clauses and the word vectors of the core argument from the trained word vector model.Then the semantic similarity among clauses can be calculated.Lastly,the subject extractor is designed to extract the subjects of clauses.Then these subjects are judged to be the same as each other or not and the subject similarity among clauses is calculated.Thirdly,the paper constructs the hierarchical division model of the non-saturated compound sentences with relative words.This paper trains the hierarchical classification model of the non-saturated compound sentences with relative words based on the extracted characteristic data.Through the analysis of the characteristic data set,it is found that the division of the hierarchical relationship of compound sentences is closely related to the semantic similarity among clauses.Therefore,the weight of semantic similarity is increased in the process of training the hierarchical division model of compound sentences,so as to further improve the accuracy of the hierarchical division model of non-saturated compound sentences with relative words.Finally,the proposed method in this paper is verified by following method.We select 10,000 compound sentences from CCCS corpus to test the hierarchical classification model of non-saturated compound sentences with relative words,with an accuracy rate of 74%.At the same time,this paper selects the random forest,support vector machine and neural network for test in the same training set and test set and then evaluates these three models from several aspects,such as,accuracy,recall rate,precision rate,Roc curve and Auc.We found the hierarchical classification model of non-saturated compound sentences with relative words based on neural network gives better results,which proves the effectiveness of this method.
Keywords/Search Tags:Non-saturated compound sentences with relative words, Hierarchical recognition, Syntactic features, Semantic features, Neural network
PDF Full Text Request
Related items