Font Size: a A A

Research On The Method Of Micro-blog Topic Detection And Tracking

Posted on:2017-07-25Degree:MasterType:Thesis
Country:ChinaCandidate:J F LiuFull Text:PDF
GTID:2348330503989887Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Micro-blog, as one of the most popular social applications, is the main way for information acquisition and dissemination. Micro-blog data is actually a high-speed, massive and dynamic information flow, in which topic detection and tracking have significant meaning to the supervision of public opinion, public opinion survey. Under this background, a clustering algorithm which can deal with large scale data streams has been proposed, uses to micro-blog topic detection and tracking, and good results have been achieved.A clustering algorithm APMStream(Affinity Propagation in Massive Data Stream) based on the affinity propagation has been put forward. The main steps include initial clustering, online clustering, cluster adjustment and cluster maintenance. The algorithm of AP(Affinity Propagation) is improved from two aspects, distributed iteration and dynamic adjustment of damping coefficient. Online clustering can merge the tuple into an existing cluster or create a new one that contains the tuple. Cluster adjustment is first to redetermine the existing cluster's center, and then the weighted affinity propagation algorithm is used to cluster the centers. Cluster maintenance maintains the system load in a reasonable range by removing the low important tuples and clusters.The APMStream algorithm is used to micro-blog topic detection and tracking. Micro-blog's importance as a priority parameters of AP algorithm, decides the probability of the micro-blog becoming the cluster's center. The distance between micro-blog is is calculated based on common word chunks.APMStream algorithm is designed to be an Apache Storm topology, data is processed on each node of this topology. After experimental verification, APMStream can quickly deal with large-scale micro-blog data stream, to detect micro-blog topic, and reflect the evolution of micro-blog's topic with time.
Keywords/Search Tags:Topic detection, Topic tracking, Distributed Stream Processing, Micro-blog topic, Affinity Propagation algorithm
PDF Full Text Request
Related items