Font Size: a A A

The Popularity Of Micro-blog Predicting Based On Logistic Regression

Posted on:2018-11-21Degree:MasterType:Thesis
Country:ChinaCandidate:Z A LiFull Text:PDF
GTID:2348330518963685Subject:Engineering
Abstract/Summary:PDF Full Text Request
At present,a growing number of people are keen to obtain and release information by social networking platforms.Retweet plays an indispensable role in social networks,through the forwarding of micro blogs,information can be quickly spread over a very large number of users and sharing,promote the speed and frequency of the exchange between people greatly.Predicting Popularity of Tweets becomes a hot issue of current research.Master the Popularity of a newly submitted tweet can have a significant impact on the early warning of public opinion,the hotspot mining and the commercial marketing.The goal of this paper is based on the logistic regression model,which predicts the forwarding number of target microblog information.This paper presents a idea to consider the similarity between the content of different microblogs can be used to predict the number of retweets,which we call content similarity.For content similarity,it can be used as a feature of micro blog information,as well as the regularization of the prediction model.The research content of this paper is divided into the following two sections:In this paper,we consider the content similarity between microblogs as a feature,namely content similarity feature.To calculate the value of content similarity feature,first need to calculate the text similarity of the microblog between other all the microblog information of the author,and then extract ranked as the top k as its similar microblogging.Finally,calculate the average of similar microblogging forwarding number,it is the value of content similarity feature of the microblogging.For this approaches,the implementation is coded separately and based on a large number of microblogs for performance evaluation.In the experiments of predicting the forwarding number of information,when the content similarity was the feature of the microblogs,the F1 obtained by the prediction experiment was about 4% higher than the non-selected feature.Visible increase content similarity was the feature of the microblogs can obviously optimize the prediction performance.In this paper,the content similarity is also used as the regularization of the prediction model,the forwarding number prediction model by the logistic regression model and content similarity of regularization.The regularization function is in the process of training parameters,the accuracy of the forecast is improved by making each microblog information in the training set more closely related to the forwarding number of similar tweets.For this approaches,the implementation is coded.In the experiments of predicting the forwarding number of information,the F1 obtained by the prediction experiment was about 1.2% higher than the prediction model without the regularization,we can see the content similarity as the feature of the effect will be better.But on the whole,the prediction error can be reduced by using the content similarity as the feature or the regularization of the prediction model.
Keywords/Search Tags:Predicting Popularity of Tweets, Retweet Predicting, Logistic regression, Content similarity
PDF Full Text Request
Related items