Font Size: a A A

Research On Computing Method Of Chinese Sentence Similarity Based On Deep Learning

Posted on:2020-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:H YangFull Text:PDF
GTID:2518306464995529Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Chinese sentence feature extraction and similarity computation is one of the research hotspots of natural language processing.At present,the sentence semantics can't be considered comprehensively by the sentence similarity calculation method,which leads to the result of the similarity calculation is not accurate enough.Therefore,this paper proposes a Chinese sentence feature extraction and similarity calculation method based on Sentence representation,semantic feature extraction method and regularization parameter selection,which are mainly divided into the following aspects:For sentence representation,a Chinese sentence similarity calculation method based on deep automatic encoder is proposed.In this paper,we propose semantic feature extraction of sentence and similarity calculation algorithm based on deep sparse automatic coder.Sentences was expressed as high-dimensional and sparse vectors;Then we used deep learning to study Non-linear characteristics of sentences.The high-dimensional and sparse vectors were transformed into Low-dimensional,nature feature vectors.This process was a more pure end-to-end learning to avoid the establishment of stop word list.Ultimately,the low-dimensional feature were used directly for sentence similarity calculation.In order to solve the problem that the manual training of regularization parameters in the experiment process leads to long training time of the model,this method is put forward to calculate the value of the regularization parameter applied to L2 regularization by using the concept of the first order origin moment,the two order origin moment,the variance and the maximum likelihood estimation.This method based on the X matrix of data sets to compute four value.In Neural Network handwritten digit recognition experiments,this method compared with the Bayesian regularization method improve the correct rate about1.14-1.50 percentage points in coursera data set and 0.11-0.75 percentage points in the MNIST data set.Therefore,the method in this paper makes the algorithm more efficient.This method is validity.The experimental results show that the algorithm used to extract the sentence features to calculate sentence similarity compared to sentence similarity computing based on relation vector model and Jacard text similarity algorithm based on word embedding improved the accuracy of similarity calculation.The computational time complexity is only O(n).The regularization parameters of the L2 regularization method in the sentence similarity experiment process are derived from the generalization of the second-order origin moment concept.
Keywords/Search Tags:deep learning, semantic feature extraction, similarity calculation, L2 regularization
PDF Full Text Request
Related items