Font Size: a A A

User Behavior Analysis And Hotness Prediction In Online Communities

Posted on:2018-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:C PanFull Text:PDF
GTID:2348330512483444Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the popularity of Web 2.0,various online communities have emerged.Online communities have several characteristics such as over-timeliness,symbolic nature and virtuality that real communities don't have.Thus more and more Internet users are being attracted to online communities.How to extract valuable information from huge amounts of data generated by online communities has become a popular research topic at present.Based on the data set of Baidu Post Bar,this paper analyzes the users' behavior,and puts forward a kind of hot posts forecasting model.The main work of this paper includes the following aspects:(1)We designed and implemented a Web crawler based on Scrapy,and collected the data of a post bar in the Baidu Post Bar during July and August.After data preprocessing,the data set contains about 60,000 theme posts,2.49 million reply posts and 220,000 users' information.The experiment showed that the number of reply posts of theme posts follows the power-law distribution.(2)Based on the data set which obtained above,a reply network among users was established,then we proved that the reply network also has characteristics of the small world and scale-free which many social networks have.We also analyzed users' behavior from the angle of active time,number of posts they posted or received and response delay.Finally,a clustering research has been done on the users who have posted,we obtained some interesting classification results by introducing an indicator called"average response delay" and explained the results in detail.(3)We proposed a kind of hot post forecasting model which based on time threshold T.We combined the features extracted from the reply network which mentioned above with other three kinds of features to predict whether the posts could be hot posts and verified the validity of the model by experiments.Finally,the influence of different time thresholds T,different classification models and different combinations of features on the forecasting results were analyzed by contrast experiments.
Keywords/Search Tags:Online Community, Scrapy Crawler, Reply Network, User Classification, Hot Posts Forecasting
PDF Full Text Request
Related items