Font Size: a A A

Research On The Method Of Microblog Text Similarity Calculation Based On Weighted Word2vec

Posted on:2020-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:S D MaFull Text:PDF
GTID:2428330602452148Subject:Information Science
Abstract/Summary:PDF Full Text Request
With the continuous development of Internet technology,major social platforms have emerged,and a large amount of text information has been generated.The mining of these text information can effectively and reasonably classify these text data,and can also dig deeper into these text information.Discover the netizens' ideological dynamics and emotional trends.As one of the important products of Internet technology development,Weibo has a very high number of Weibo participants due to its low threshold.Weibo's discussion on social life events is also very high.Therefore,text mining analysis is very The important significance,at the same time,in the field of text mining,text similarity calculation is the basis of many other text mining applications,can solve the messy disorder of a large number of texts on the Internet,is very important in the field of natural language processing,can Subsequent text mining processes provide the underlying support.In view of the characteristics of Weibo text dissemination,it is a great challenge to do a good job in calculating the similarity of Weibo text.In order to solve this problem,this paper introduces the advantages of word vector.We put forward a new method which considers both the semantic information and statistical information of the text to calculate the similarity between texts,this method is based on the analyzation of the overall framework flow of text similarity calculation.The various modules of the text similarity calculation model designed in this paper are designed and described in detail.In this paper we did the following work and got these results:1.The research development history and research status of text similarity calculation are discussed.The definition and connotation of text similarity,common text representation methods and several examples are given in combination with the actual research background.A classic text similarity calculation model,and explain in detail the word vector technique used in this paper,which lays a good foundation for the establishment of the model;2 the existing text similarity calculation method by extensively reading relevant literature Based on the summary analysis,the motivation of the text similarity calculation method based on weighted Word2 vec is presented,and the overall framework of the text similarity calculation method is summarized.The analysis of technology,using the word vector in the big data environment can well represent the advantages of word semantic information,analyze the text features and complete the construction of the whole model.At the same time,the detailed functions and implementation flow of key technologies such as text preprocessing,text feature vector acquisition and similarity calculation are analyzed in detail.4 Based on the above theoretical research and related technical analysis,the text similarity calculation method proposed in this paper is applied.The text classification study was carried out in the collected microblog experiment data set 1.The feasibility of the text proposed method was verified by comparing the experimental results.In the collected microblog experiment data set 2,the text similarity calculation method proposed in this paper uses K-means to perform microblog text clustering,so as to get the theme of these text clusters,and extract the words with the highest frequency to represent The theme of these text clusters.
Keywords/Search Tags:Text similarity, TF-IDF, Distributed Representation, Word2Vec, micro-blog
PDF Full Text Request
Related items