Font Size: a A A

Research Of Hot Topics Prediction Based On Structural Features Of Micro-blog

Posted on:2016-12-11Degree:MasterType:Thesis
Country:ChinaCandidate:D H LiFull Text:PDF
GTID:2308330461492749Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology, the cost of communication between people has been greatly reduced. As a result of that the information barriers have been broken, vast amounts of information can be transmitted to anywhere in the world where is under the coverage of Internet. In the Internet era, many social network service represented by Twitter and Sina Weibo emerged. On the platform of Sina Weibo, hundreds of millions of users involved in the manufacture and dissemination of information. Huge amounts of information generated a lot of hot topics, so how to effectively discover and predict hot topics on the platform of Weibo is an important research direction in the field of data mining.This thesis deeply studies the basic structure and transmission characteristics of Sina Weibo platform and finds out three main difficulties of detecting and predicting hot topics in Sina Weibo. The first problem is complexity of Chinese semantics: under the limitation of 140 words, messages of Sina Weibo are mostly short texts. In addition, the micro-blog terms are usually not normative and include much network language, which makes it very difficult to analysis the semantic information in Chinese micro-blogs. The second problem is fast speed of dissemination. With the development of smart phones and mobile Internet, more and more people login Sina Weibo through mobile terminals, so people can post or forward a piece of micro-blog in which they are interested anytime and everywhere, which makes messages transmit faster. The third problem is vast quantities of data: in the platform of Sina Weibo, more than 100,000,000 micro-blogs are released every day, which is a big challenge for data acquisition and analysis.In order to solve the above problems, this thesis tracked the spreading process of several hot topics in Sina Weibo, proposed a method to discover and predict hot topics based on the structure information and propagation characteristics of Sina Weibo, and a system is designed to implement it. This method takes opinion leaders’ micro-blogs as a breakthrough point, and location the outbreak point of hot topics on opinion leaders. First the micro-blog data of users are collected and hot micro-blogs are screened out through the discriminant model, then topics are extracted based on the word co-occurrence graph model. At last, the trend of topics are predicted by the improved SIR model.Finally, some experiments are conducted to test and verify the effect of the hot micro-blogs discriminant model, the improved SIR model and the whole performance of the prediction system. Experimental results show that the discriminant algorithm can effectively extract valuable data and filter out garbage information, the improved SIR model can precisely predict hot topics in time. But the whole performance of the system indicates that the prediction system still has great room to improve in terms of timeliness and comprehensiveness.
Keywords/Search Tags:Sina Weibo, Hot Topics, Propagating Regularity, Discriminant Model, Prediction, SIR
PDF Full Text Request
Related items