Font Size: a A A

Research On The Relation Inference Of Marked Compound Sentence Based On Neural Network And Feature Fusion

Posted on:2020-07-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y WangFull Text:PDF
GTID:2518305762478944Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Compound sentences are usually composed of two or more clauses that are semantic closely related.These clauses are independent of each other and semantically interdependent.Relation marks(words)that is used to connect each clause are the grammatical component of the compound sentence.A marked compound sentence refers to the compound sentence in which at least a relation mark appears.Compound sentence relation inference aims to analyze the semantic relation of compound sentences and identify the relation categories of the sentences.The results of relational inference are widely used in machine translation,question-answering systems,automatic summarization and other fields to improve performance of these systems.There are two main methods of traditional sentence relational inference.One is that linguists form constraints according to the linguistic rules,and establish the corresponding rule bases based on these constraints.The other is that the corpus-based statistical methods are used to summarize the semantic features of compound sentence,and then feature engineering is constructed.However,neither of these two methods is the best.On the one hand,pre-existing natural language processing(NLP)toolkits are used to extract features,which are easy to bring propagation errors;On the other hand,the feature set is sparse,lacks completeness,and cannot cover all linguistic phenomena,making the acquired feature set generalized poorly.In order to obtain the feature set of the compound sentence,extract the semantic information in the compound sentence,and capture the semantic association between the two clauses,this thesis introduces the semantic representation and deep learning method.The main contributions in this thesis are as follows:Firstly,the compound sentence corpus composed of "Yangtze's Daily","People's Daily" and some contemporary novels is counted,the mapping relation,the part of speech of the relation mark and the collocation object of the relation mark corresponding to the category of the relation mark and the compound sentence relation are summarized,and the corresponding relation mark library is constructed according to the above attribute values.The above attributes of all alternative relation marks were counted.An alternative relation marked library was constructed.At the same time,based on the above corpus,a marked compound sentence with two clause corpus is created.This corpus is used as a data set for this research.Secondly,this thesis proposes a long short-term memory(LSTM)model based on bidirectional attention mechanism.The model first uses the bidirectional LSTM(BiLSTM)to encode the sequences of the two clauses,and then uses the bi-directional attention mechanism to calculate the similar weights of the interaction between the clauses,and the weights are respectively to capture the mutual information between the clauses.Finally,the semantic information between the clauses in the whole compound sentence is obtained,and then the relation category of the compound sentence is inferred.Then,this thesis proposes a method of fusing feature relation vectors in convolutional neural network structures.The feature relation vectors are one-hot encoded according to the attribute characteristics of relation marked library.This method uses multiple convolution kernels with different sizes to mine the semantic features of sentences and obtain the most significant features,so as to predict and recognize relation of the compound sentence.Finally,the method proposed in this thesis is validated on the corpus of the two-sentence compounded sentence.The experimental results show that the LSTM model based on bi-directional attention and the convolutional neural network model based on feature fusion are better than the rule-based and corpus-based statistical methods,and the scalability of the neural network model is guaranteed.Because deep learning framework automatically learns informative features of different levels from the input data,it can bypass complex artificial design feature modules and reduce error propagation.In addition,since the convolutional network model based on feature fusion combines the distinctive features of relation markers,and uses multiple convolution kernels with different sizes to capture features of different levels,which not only plays a positive role in the learning of deep features of the model,but also the learning ability of the model is enhanced.Therefore,the model is better than the former.
Keywords/Search Tags:Relation category of compound sentence, Relation mark, Deep learning, Feature fusion
PDF Full Text Request
Related items