Font Size: a A A

Similar News Recommendation Based On Simhash And CNN

Posted on:2021-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:Q Y ZhangFull Text:PDF
GTID:2428330623968533Subject:Engineering
Abstract/Summary:PDF Full Text Request
In this digital era,the information on the Internet is growing rapidly every day.The overload of all kinds of information makes users unable to get their most concerned content quickly and accurately,which increases the time and energy cost when users get information.Therefor in the field of Content-based News recommendation,similar text detection has an important application.In the traditional methods,we usually recommend based on the statistical information of text keywords,but ignore the semantic information similarity of natural language.With the rise of machine learning and deep learning in recent years,the development of natural language processing technology,especially the breakthrough of word vector technology in the semantic information representation of text,the detection of text similarity is no longer limited to statistics.However,in the similarity detection of massive text,deep learning improves the accuracy of the results,but it also loses the advantage of the traditional method speed and the need of less computing resources.Therefore,in order to improve the accuracy and reduce the consumption of time and computing resources in the similar recommendation of massive news,this thesis studies a similar news recommendation algorithm combining Simhash and convolutional neural network algorithm.The main idea is to use the Simhash algorithm for the preliminary selection of similar texts,and then use the convolution neural network algorithm to recommend similar texts with higher accuracy.The main research work consists of three parts.(1)On the basis of traditional Simhash method,improve the process of obtaining document features,calculate the weight of words by comprehensively considering TFIDF value and part of speech;in view of this situation.For a large number of Simhash values lead to too much retrieval calculation,the efficiency of fast retrieval method is improved.In view of the unbalanced distribution of the inverted index table,the unbalanced elements in the bucket are hashed again and inverted to make the distribution balanced.(2)Based on the analysis of the advantages and disadvantages of the text similarity detection model,a two-channel convolution neural network model is proposed.The concept of word vector model word2 vec and sentence vector model doc2 vec are introduced.The two text representations of text pair are calculated interactively as the feature input of the model,so that the model can learn the similarity on the two granularity of words and sentences.(3)The experimental results of efficiency and accuracy of the algorithm proposed in this paper are obtained by comparing various methods.It is proved that in solving the problem of similar news recommendation,our method of similar text recommendation combined with Simhash and CNN algorithm has practical significance in the field of content-based similar news recommendation.
Keywords/Search Tags:NLP, Simhash, CNN, Text Similarity, News Recommendation
PDF Full Text Request
Related items