Font Size: a A A

Popularity Prediction Based On Microblog Mining

Posted on:2017-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:K LiFull Text:PDF
GTID:2308330485484513Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of Web 2.0 and mobile internet, a large number of social network platforms have emerged. As one of them, microblog has attracted massive users with its convenience, originality, interactivity and grassroots, becoming an important platform for information acquiring and sharing in people’s daily life. Studying how to predict the popularity of the information on microblog timely and accurately is of great significance in such fields as content recommendation, advertising,marketing and public opinion monitoring. In this thesis, we study popularity prediction in Sina Weibo. The main details are as follows:1. The thesis analyzes the influence of content, temporal and network factors on microblog propagation. The experimental results show that tweets without links are more likely to be retweeted; the more users a tweet mentions, the smaller its number of retweets tends to be; tweets posted at different time periods have very different numbers of retweets; there is a certain degree of negative correlation between the number of retweets and the minimum retweet time interval; tweets with large amount of early exposure tend to have more retweets; the popularity of a tweet has a significant linear negative correlation with the link density of its early retweet network.2. The thesis analyzes the weaknesses of features commonly used in existing researches on microblog popularity prediction. Then based on the analysis, a series of new features are extracted. Finally, the new features combined with the classic ones are used to train the classification models such as logistic regression, naive bayes, support vector machine and random forest. We use the models to predict the popularity range of the target tweet. Experimental results show that after the addition of our new features, the prediction accuracies of the four models are improved by 1.91%, 14.80%, 2.92% and6.92%, respectively.3. The thesis also applys the new features to a similarity-based prediction method.We measure the similariy between two tweets with weighted mahalanobis distance calculated by the features. Then the top-k most similar historic tweets to the target tweet are selected and their weighted average popularity is taken as the final popularity of the target tweet. In addition, we apply particle swarm optimization to parameters selection.Experimental results show that compared with the best results obtained by only using the classic features, our new features can reduce relative absolute error by 0.0801 and improve the accuracy by 9.00%; with particle swarm optimization, the relative absolute error is reduced by 0.0640 and the accuracy is improved by 6.00%.
Keywords/Search Tags:microblog, popularity prediction, feature fusion, similarity, particle swarm optimization
PDF Full Text Request
Related items