Font Size: a A A

Research On Microblog Hot Topic Discovery Technology Based On Frequent Word Sets

Posted on:2022-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:M Y LiuFull Text:PDF
GTID:2518306764993289Subject:Internet Technology
Abstract/Summary:PDF Full Text Request
With the rapid popularization of the Internet and the rise of social network media,the amount of information disseminated through the Internet has shown the characteristics of real-time massive amounts.Weibo,as an open social network platform widely used in China in recent years,plays an irreplaceable role in information release and public opinion expression.Accurate and effective mining of hot topics in Weibo will help government departments understand current public opinion information in a timely manner,formulate public opinion response plans,and create a harmonious and healthy network environment for citizens.Therefore,the mining of hot topics on Weibo is a task of great value and significance,but it also faces many challenges.Due to the short length and low amount of information in microblog texts,traditional hot topic discovery methods have problems such as lack of semantic information in text representation and poor results in mining hot topics when processing microblog text.This paper makes improvements in text representation,hidden topic mining,and hot topic evaluation,and proposes a practical and effective microblog hot topic discovery method.The main research contents are as follows:(1)Aiming at the problem that the traditional statistical-based frequent word set text representation method cannot take into account low-frequency words and ignore the semantic relationship between words,a text dual representation model based on frequent word sets and BERT semantics is proposed.frequent word sets and BERT semantics,FWS-BERT).This model performs frequent word set mining and BERT sentence vector representation on Weibo texts,constructs a text similarity matrix through a weighted fusion method,and establishes a semantic fusion mechanism based on frequent word sets and BERT sentence vectors to achieve a deep understanding of short texts.Hierarchical information mining,and clustering of Weibo topics through clustering algorithms.The experimental results show that the use of this model for topic clustering has achieved high experimental results in both the contour coefficient and the Calinski-Harabasz(CH)index value.(2)In order to better mine the hidden topics under each topic,this paper proposes a microblog topic detection method(FWS-AP)based on frequent word sets and affinity propagation(AP)clustering with improved similarity measures.In order to improve the clustering performance of the AP algorithm,the Min Hash algorithm is introduced to replace the original Euclidean distance metric in the AP clustering algorithm,and the frequent word sets under each topic are clustered to achieve effective mining of hidden topics under different topics.Finally,by analyzing the structure of the microblog data itself and the topic dissemination law,comprehensively considering the influencing factors of the hot topics of microblogs,introducing the H index in bibliometrics and selecting the two dimensions of topic word popularity and user participation for topic popularity Value calculation,a method for evaluating hot topics on Weibo is proposed.Through experimental verification,the FWS-AP method can effectively detect the hidden topics under different Weibo topics,and the results obtained by the hot topic evaluation method are also consistent with the real Sina Weibo topic popularity ranking results.
Keywords/Search Tags:Weibo, frequent word collection, BERT, AP clustering, hot topics
PDF Full Text Request
Related items