Font Size: a A A

Research On News Hot Prediction Method Based On Sina News Data Analysis

Posted on:2020-08-02Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y LiFull Text:PDF
GTID:2428330575974174Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
We are in an era of explosive information growth.According to statistics,as of June 2018,the number of mobile Internet users in China has reached 788 million.This huge user group creates a large amount of network data on news websites,weibo,Facebook,WeChat and other social platforms.At present,there are few studies on news and its comment data,lacking quantitative analysis.For online news,news commentary is an important component of its dissemination and fermentation.From another perspective,compared with the traditional media,the depth and breadth of online media is extremely fast and wide,and it is easy to form a popular heat.The public opinion incidents have led to an increase in the difficulty of solving some incidents.Therefore,discovering news that may become a hot event in advance can help relevant regulatory authorities monitor the development of online public opinion,avoid the occurrence of cyber violence,and help maintain social stability.This paper first collected 116,595 news data from sina news entertainment,science and technology,sports,finance,military,and Collection,and the corresponding 49,642,12 comment data,and cleaned the data into the library.For different types of news,NumPy,Pandas,matplotlib and other tools were used to analyze the spatial distribution characteristics of news commentary,including news categories,number of participants,and news release time.Analyze the generation process of comment data over time to obtain its time distribution characteristics.Next,the sum of the number of comments and the number of points is used as the heat value of the news.From the hour and week dimensions,the relationship between the news release time and the news heat is analyzed again to prepare for the news heat forecast.Finally,the principles of multiple Linear Regression and K-Nearest Neighbor in the basic regression model and Gradient Boosting Decision Tree(GBDT)in integrated learning are expounded respectively.The process of algorithm from model construction to prediction is briefly illustrated by taking sina news data as an example.According to the distribution characteristics of the comments,relevant features were extracted,and the Linear Regression,KNN and GBDT algorithms were used to predict the news heat.The results show that the GBDT prediction effect is better.In order to improve the forecasting effect of news heat,the data is divided into sports data and other types of data,which are separately trained and predicted by GBDT,which improves the accuracy of news forecasting of sports news.At the same time,in order to improve the overall generalization performance of the model,the Linear Regression,KNN and GBDT algorithms are used as the base learner to integrate through the average method strategy.The experimental results show that the integrated algorithm can improve the news heat prediction effect and make the model The generalization performance is better.
Keywords/Search Tags:network news, distribution characteristics, GBDT, ensemble learning, heat prediction
PDF Full Text Request
Related items