Font Size: a A A

Research On Sentence Similarity Calculation Based On Neural Network

Posted on:2021-12-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y L LiFull Text:PDF
GTID:2518306467469884Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Sentence similarity calculation is a process of quantifying the degree of semantic similarity between sentences.As basic research in Natural Language Processing(NLP),it plays an important role in many NLP tasks and has been widely concerned by researchers recently.The traditional method of sentence similarity calculation is based on the feature extracted manually,but this method is always suffering from the problem of feature sparsity,which makes the result unsatisfactory.In recent years,the rapidly developing neural network makes a good performance in extracting semantic features from text and can effectively mine the hidden semantic information in sentences,thus many problems faced by artificial design features can be alleviated in this way.Therefore,more and more researches focus on the application of neural network in sentence similarity calculation.Nowadays,the sentence similarity calculation method based on a neural network is mainly divided into two categories.One is the interaction-based method.This method firstly constructs the matching matrix according to the relationship of the matching units between sentences,then uses the convolution neural network to extract the corresponding features from the matching matrix and calculate the sentence similarity with these extracted features.Since only the interaction of words or phrases between sentences is considered when constructing the matching matrix,the interaction of the whole sentence is always neglected,which affects the result of the similarity calculation between sentences largely.The other is the representationbased method.This method usually uses the sentence encoder to extract different types of features for generating sentence representation and then calculates the sentence similarity according to the distance between sentence representations.However,the local features extracted by the sentence encoder often ignore the sequence-related information in the sentence and the global features extracted are relatively insufficient,thus the accuracy of the final sentence similarity is unsatisfactory.To solve the above problems,we make the following solutions:1.We propose a sentence similarity calculation model with a combination of local and global features.This model uses an improved convolutional neural network(CNN)to extract local features of sentences and represent them as local semantic features corresponding to words.Then the pre-trained word vectors and local features are cascaded to obtain the word representation which integrates local semantic information and input the sentence sequence formed by the new word representation into the bidirectional gated units.Finally,the global features of the sentence are extracted to represent the sentence,and the Manhattan distance formula is used to calculate the sentence similarity.The experimental results show that our model is effective in sentence similarity calculation and greatly improves the accuracy of measuring sentence similarity.2.We propose a hybrid interactive sentence similarity calculation model.This model first uses the sentence encoder to encode sentences for obtaining the corresponding sentence representation and then represents the sentence-level interaction relationship according to the vector difference between the sentence representations.Then,the matching matrix is constructed based on the semantic relationship between words of sentences,and the features in the matrix are extracted by the convolutional neural network.Finally,the final sentence similarity is obtained by fitting the above feature vectors and go through the fully-connection network.Experimental results on real data sets show that our model is superior to other models of sentence similarity calculation.
Keywords/Search Tags:Sentence similarity calculation, local feature, global feature, interaction information, sentence representation
PDF Full Text Request
Related items