Font Size: a A A

Research On Hot Topic Detection Methods For Microblog

Posted on:2014-07-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y D LiFull Text:PDF
GTID:2268330401969485Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Micro-blog has got widely participation and changed the way people get information since it was born. In recent years, a lot of hot spots are first released by micro-blog, therefore, finding hot topics on micro-blog has an important significance. However micro-blog has large amount of information, and the information is dispersed, traditional text processing methods can’t adapt to process micro-blog. This thesis researches a method of how to find micro-blog hot topics.On the basis of analyzing micro-blog features, we use the text clustering method to find micro-blog hot topics, and some innovative contributions are achieved as follows:1. An algorithm is presented for preprocessing micro-blog. According to the different of micro-blog content, we divide micro-blog into three forms:primary, forward and comment, forward and comment micro-blog are merged into the primary micro-blog. A large number of forward and comment micro-blogs exist in the network, they are derived from primary micro-blog, therefore forward and comment micro-blog are merged into the primary micro-blog can reduce the size of the micro-blog text and can’t affect the result of finding hot topics; in addition, micro-blog topics become hot topics due to many users forward and comment these topics, forward and comment micro-blog are merged into the primary micro-blog can form pre-hot topic.2. Propose an algorithm for reducing micro-blog text space vector dimension. For the case of micro-blog has a lot of noise words and learning from word frequency statistics method, we statistic all words frequency after separating words and deleting stop words, then remove low frequency words, by this way micro-blog text scale is reduce further.3. Pre-hot topic and the time window limit Single-Pass clustering combines with Hierarchical clustering method (TW-SPHC) is presented for text clustering. Micro-blog hot topics have lots of followings and topics close in time has high similarity, in order to limit the scale of comparison, pre-hot topics are elected by the weight and the time window are set in the Single-Pass clustering; using Hierarchical clustering can merge similar topics as much as possible; between the two clustering isolated topic clusters should be deleted. Micro-blog topic clusters can be obtained by the above method, at last we get hot topic clusters according to theirs weight. 4. Realizing the above methods and studying on the algorithm’s performance. A large number of experiments show our method can find hot topics in large-scale micro-blog text accurately and quickly, this method has a higher practical value.
Keywords/Search Tags:micro-blog mining, hot topic, text clustering, text similarity
PDF Full Text Request
Related items