Font Size: a A A

A Study On Hot Micro-blog Forecast Based On XGBOOST And Random Forest

Posted on:2018-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:H B DiFull Text:PDF
GTID:2348330542481173Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of social networking sites,such as sina Micro-Blog,more and more people are making friends on social networking sites,and the contents in the micro-blog website also show an exponential growth.In recent years,many scholars and researchers have been carrying out research on micro-blog one after another.Most of the study is conducted around the Micro-Blog hot topic,study of the hot micro-blog is less.Hot micro-blog refers to the micro-blog that received the highest number of comments,forwarders,and praise within a period of time.This paper realizes the hot micro-blog forecast with the help of feature extraction,discretization,classification.First of all,clean micro-blog’s contents and do word seg-mentation,then extract the theme features of micro-blog text from the text according to the topic model so as to get the topic distribution of the text.Extracting the non textual features.Then the comprehensive feature set could be created by combining the non-textual features and theme features of micro-blog text.Finally,the paper uses random forest classification algorithm which is constraint based to achieve stall prediction of the number of interactions in the Micro-Blog,and thus to complete the prediction of hot micro-blog.In order to improve the speed and accuracy of prediction,This paper indicates a new way of discretization based on XGBOOST,processing features with the path of tree’ s forecasts.This paper proposes Constraint Random Forest about the short-comings that traditional Random Forest has regarding the selection of feature.According to the corre-lation of Pearson coefficient,the feature set is divided into different intervals,and formed the candidate feature set for split nodes according to certain proportion.Experimental results show that by using the classification algorithm proposed in this paper after the discretization of the features,the classification accuracy has been improved to a large extent.
Keywords/Search Tags:Micro-Blog, Discretization, XGBOOST, Random Forest
PDF Full Text Request
Related items