Font Size: a A A

Research On Text Similarity Recognition Based On LSTM

Posted on:2019-10-23Degree:MasterType:Thesis
Country:ChinaCandidate:F YangFull Text:PDF
GTID:2428330548961889Subject:Engineering
Abstract/Summary:PDF Full Text Request
Text similarity computing is a key and challenging task in the field of natural language processing.It is widely used in text categorization,text sentiment analysis,machine translation,information retrieval and other tasks.In recent years,thanks to the promotion of relevant semantic evaluation competition,how to study the similarity measure of semantic text has attracted many researchers.Because the data with tags is very limited,and the length of sentences is variable and the sentence structure is complex,it is still a very difficult problem to recognize the semantic similarity between texts.The early solution to this task is to calculate the text similarity by combining the features of the text and various classifiers.With deep learning technology unceasing development and mature,a growing number of neural network technology is applied to calculate the text similarity,especially LSTM model using the unit memory to store data information,makes the LSTM model in dealing with time series data has the incomparable advantage.To research the words and sentences,this paper puts forward the Siamese LSTM model,at the same time,integrated into the word embedding technology and attention mechanism.The experimental results show that by pairs of data fusion of sample training attention attention mechanism of Siamese LSTM model can learn in the structure of high dimensional space to the text of the rich semantic features,said the characteristics on the performance of the text similarity computing is superior to other algorithms.The main research work is as follows:1.Text similarity recognition based on traditional methods.Based on SVM classifier,the text similarity recognition is accomplished by feature fusion.For text structure complicated,the length is not fixed,high dimension data lead to the problem of training has nothing to do with vector said,using Word2 Vec word vector representation method,this method will be mapped to high-dimensional text data words lower dimensional vector,avoiding the dimension disaster of the training sample,also keep a synonym for vector in semantic similar features.2.Text similarity recognition based on Siamese LSTM.Standard LSTM individually trained each text vector can only represent a text,the text between wereindependent of each other,this will lead to learn the characteristics of the text to the inner link between the lost text.In view of this shortcoming,this paper puts forward a method to calculate the similarity of text by using the text as input sample and the LSTM as the training model.3.Text similarity recognition of attention mechanism.The attention mechanism is integrated into the Siamese LSTM model,and the attention weight is used to obtain more accurate text representation according to the importance of the words in the text and different weights.The experimental data show that the text features of Siamese LSTM model with the Attention mechanism have excellent performance.
Keywords/Search Tags:Text similarity, siamese LSTM model, attention mechanism, Word2Vec
PDF Full Text Request
Related items