Font Size: a A A

Research On Hot Word Analysis Technology For Microblog Text

Posted on:2020-11-20Degree:MasterType:Thesis
Country:ChinaCandidate:R WuFull Text:PDF
GTID:2438330596997509Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the diversification of information dissemination methods on the Internet platform,Weibo is a popular social networking platform,realizing the real-time information from publishing,dissemination and reception.Users can get real-time hot topics as quickly as possible.As a text expression of events and emotions,Weibo uses natural language processing technology to provide automatic detection of Weibo hotspots.However,due to its real-time characteristics,its text processing process is different from conventional texts.Therefore,it is of great significance to mine hot topic topics for Weibo data.The data characteristics of Weibo is analyzed in this thesis,it gives the definition of quantifiable hot words,and proposes an efficient hot word analysis algorithm for mining real-time hot topics.The algorithm first preprocesses the Weibo data and uses the variance to eliminate the high frequency words to the subsequent heat.The interference of word analysis,then transforms the Newton's law of cooling in thermodynamics and then performs hot word discovery for Weibo,and proposes a dynamic threshold-based culling rule to remove low-frequency words with large word frequency change rate.Based on the results of the above hot word discovery,the relationship between the hot words is extracted,and the left and right information entropy and mutual information are used to correlate the hot words.Finally,the word co-occurrence model is introduced to realize the secondary association of the hot words,which will be expressed.The hot words collection of the same hot topic is merged to output the final hot topic.A hot word analysis algorithm is proposed in this thesis suitable for Sina Weibo data,combined with actual data to conduct experiments.The experimental results show that the accuracy of the algorithm to identify hot topics is 71.23%,and the error rate is controlled within 8.17%.making Hot word analysis for real-time hot topic mining is more reasonable and efficient.
Keywords/Search Tags:hot words, Newton's law of cooling, the left and right information entropy, mutual information, co-occurrence model
PDF Full Text Request
Related items