Font Size: a A A

Research On Topic Detection And Tracking Method Of Microblog

Posted on:2014-10-22Degree:MasterType:Thesis
Country:ChinaCandidate:X P FengFull Text:PDF
GTID:2268330422963434Subject:Information security
Abstract/Summary:PDF Full Text Request
Since microblog incept in2006,the major microblog platform stage one after another,the users also growing. As of July2012,registers of Facebook is more than one billion,Twitter more than500million, the number of users of Sina and Tencent microblog are alsoover300million. Every day, hundreds of millions of news release on the platform,makeusers overwhelmed. Topic detection and tracking can help people know the hot topic andunderstand the topic of the latest progress. To this end, this paper research a topicdetection and tracking method for microblog.Proposed grouping topic detection methods for microblog(GBTD). Divide all textinto multiple groups,for each packet use ISPC method clustering to get many little topicset, and then use the single-pass algorithm cluster all little topics, the final clusteringresults is the newest topics. The ISPC sort all text in accordance with the high-frequencyword hit rate from high to low, then using a single-pass clustering algorithm clustering, ifthe continuously generated new category number exceeds the threshold value then stopdetection immediately. Otherwise, until all input data have been detected the stopdetection. Use collected data of microblog run GBTD, single-pass, ISPC algorithm, theresults show that GBTD’s accuracy rate is95%, the recall rate is90%.Proposed the double core topic tracking method for microblog(DCTT). Build a corefor each new generation of topic. Each old topic has Two core, one is the old center, refersto the core topic of the text far from now, the other is a new core, refers to core topic of thetext close to current DCTT calculates the similarity between new topic’s core and eachcore of the old topic. If the similarity exceeds the threshold,merge new topics with oldtopic as the old topic, old core and new core consist of current old core, new topic’s core iscurrent new core. Otherwise add a old topic to the old topic list, the old core is empty, newtopic’s core is the new core. the results show that DCTT.’s accuracy rate is97%, the recallrate is93%...
Keywords/Search Tags:Microblog, Topic detection, Topic tracking, Single-pass
PDF Full Text Request
Related items