Font Size: a A A

Research On Hot Topic Discovery Based On The Characteristics Of Microblog

Posted on:2017-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:L Y TangFull Text:PDF
GTID:2348330488990767Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the popularity of intelligent terminal equipment and the development of mobile Internet, the way people access to the Internet is becoming more and more convenient. At the same time, people have become increasingly strongly demanded for social activities. In this case, different kinds of social softwares develop rapidly, which mainly includes those widely used ones such as Tencent QQ, Wechat, Easychat and Weibo(microblog). Among all of these social softwares, microblog attracts the masses of users with its features of openness, real-time performance and rapid spreading. Because of its hugeness of users that makes the microblog information create a massive data stream which results in a large number of information is overwhelmed. Therefore, the topic of how to get meaningful information among oceans of data and find hot topics people are interested in has become an important aspect of research on microblog.Based on the features of microblog, this thesis analyzes the features related to the users and the data, then it presents an IH(Influence-Hot) data filtering method for the microblog hot topic discovery. Additionally, the thesis studies the traditional topics discovery methods and analyzes their respective merits and deficiencies. At last, the thesis proposes an ISP-LDA microblog hot topic discovery method which combines the improved ISP(Incomplete Single-Pass) clustering and LDA(Latent Dirichlet Allocation) model. The main contents of this thesis are as follows:1) IH data filtering method based on the features of microblog is proposed in this thesis. After analyzing the features of microblog users and studying the roles played by different users of the platform, the thesis proposes a calculation method of microblog users'influence factor Uinfluence. Meanwhile, this thesis presents a calculation method of microblog spread heat Wht, based on the analysis on the characteristics of microblog data. Finally, by integrating the parameter of microblog users'influence and microblog spread heat, the thesis raises an IH data filtering method based on the microblog's own features. The method greatly compresses the data size on the premise of no reduction of the effectiveness of data, which reduces the time complexity of the whole process of topics discovery.2) The thesis puts forward an ISP-LDA microblog hot topic discovery method which combines the improved incomplete single-pass clustering method and LDA model. After studying several methods of traditional topic detection and analyzing their respective merits and deficiencies, the thesis proposes a kind of improved incomplete single-pass clustering method (ISP) combing the related features of microblog data. Then, the thesis launches LDA modeling in the processing results of ISP method to get microbloa hot tonics.ISP-LDA method combines the speediness of single-pass clustering method data processing and the accuracy of the LDA model topic representation.The experimental results show that ISP-LDA method is more accurate than the existing single pass clustering and LDA model method while discovering hot topics on the large-scale microblog data, at the same time, the time cost of ISP-LDA method is significantly lower than the existing single pass clustering and LDA model method.
Keywords/Search Tags:Microblog features, Topic discovery, LDA model, Data filtering, Incomplete single-pass clustering
PDF Full Text Request
Related items