Font Size: a A A

Chinese Sentence Similarity Measurement Based On Convolutional Neural Networks

Posted on:2020-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y SunFull Text:PDF
GTID:2428330572487283Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The web text data growth has exploded in recent years.It has a large data size,fast growth speed and low density value.How to classify,search,and filter these text data has become the research focus in the field of information management.Short text processing,such as sentence similarity calculation,is one of the core technologies of information management and is widely used in text classification,information retrieval,automatic question answering,and other fields.In this dissertation,the convolutional neural network model is used to calculate Chinese sentence similarity.This dissertation mainly includes two research contents as follows:In this dissertation,we first propose Chinese sentence similarity calculation method based on sentence structure information.The input of most sentence models is only the information of the sentence itself.In order to improve the extraction of feature information of the model,people will consider adding the interactive information between two sentences.Some people even think that there is a relationship between tags,so as to add tag information.However,these improvements are limited to the performance of the model.Compared with English,Chinese sentences have very flexible grammar,and the components of sentences have complex relationships.External tools is applied to analyze the dependency syntactic structure of sentences as the structural information of sentences in this dissertation.On the basis of MPCNN model,we propose a DP-MPCNN model which fuses sentence structure information.For sentence representation matrix,we use full-dimensional convolution core to convolute,and for sentence structure information,we use single-dimensional convolution core to convolute,so as to extract more sentence features.Experiments on data set ChineseSTS show that the proposed method is not only effective in inputting sentence structure information,but also improves the effect of DP-MPCNN model on network structure.In this dissertation,we also proposed Chinese sentence similarity calculation algorithm based on attention mechanism.On the one hand,in most models for sentence pair matching problems,sentence pairs are completely independent in the modeling process,largely ignoring the context interaction between input sentences,failing to identify the key words in sentence pairs,and lacking semantic details;On the other hand,the convolution neural network can only get the local information of the sentence,and increase the receptive field by cascade.The recurrent neural network carries out sequence modeling through recursion,which is essentially a Markov process,so it cannot learn the related words in the sentence well.Aiming at the above two problems,this dissertation USES interactive attention to calculate the lexical correlation information between sentence pairs in advance,and extracts the relationship between the current word and other words in the sentence in one step by self-attention to help understand the meaning of the whole sentence,and then fuses the attention matrix of these two parts together as the input of convolutional neural network.In addition,this dissertation attempts the influence of various information fusion methods on the calculation results,and we compare the method with some existing sentence pair matching models and obtain the best calculation results.
Keywords/Search Tags:Similarity Calculation, Dependency Structure, Attention Mechanism, Convolutional Neural Network, Feature Fusion
PDF Full Text Request
Related items