Font Size: a A A

Research On Cluster Analysis Of Hot Spot Microblogs Based On Genetic Algorithm

Posted on:2019-07-25Degree:MasterType:Thesis
Country:ChinaCandidate:H FengFull Text:PDF
GTID:2428330566980743Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet,especially the popularity of mobile Internet,micro-blog has become an important social tool.At the same time,data mining of micro-blog has become a hot research topic.This article takes Sina micro-blog as the research object,through the in-depth study of micro-blog's blog,micro-blog time and micro-blog related attributes,an improved genetic clustering analysis algorithm is designed to realize the mining of the hot topic of the micro-blog.The main work of this article consists of the following three aspects:(1)In view of the problem of the excess dimension of word vector expressed in the space vector of micro-blog short blog,the word vector library of the blog is obtained through the neural network model in Word2 Vec,which solves the key problem of the weak correlation of the word vector in the TF-IDF algorithm,and realizes the quantization of the micro-blog blog.(2)The heat of micro-blog will drop to zero as time goes on.According to the principle of simulated annealing algorithm,the thermal attenuation model of micro-blog is designed.Through a large number of experiments,the accuracy of micro-blog thermal attenuation model is above 80%,which is better than the level of similar analysis.(3)In view of the shortage of fuzzy clustering analysis algorithm,an incremental fuzzy clustering algorithm based on genetic annealing is designed,which can be used to cluster analysis of the new micro-blog blog at any time.Finally,the average precision,average recall and average F value are used to evaluate the algorithm.The accuracy rate is 82.3%,which is higher than the other micro-blog topic extraction accuracy.Finally,for a large number of sina micro-blog data,through the preprocessing operations such as participle,blog to quantization,time and related attributes,a micro-blog vector is formed,and an improved genetic clustering algorithm is used to analyze the hot topics of the micro-blog hot spot.Compared with the micro-blog hot search rankings in the time period,6 of the top 8 hot topics were consistent with the experimental results,with a accuracy rate of more than 75%.
Keywords/Search Tags:Hot microblog, Word2Vec, Simulated annealing, Genetic algorithm, Clustering analysis
PDF Full Text Request
Related items