Font Size: a A A

Research On Feature Analysis And Trend Prediction Of Multi-source Data In Internet Environment

Posted on:2020-09-21Degree:MasterType:Thesis
Country:ChinaCandidate:C TangFull Text:PDF
GTID:2428330596976761Subject:Engineering
Abstract/Summary:PDF Full Text Request
In the current Internet environment,netizens are increasingly accustomed to using search engines to query information of their own interest,and will also use microblogs,forums and other platforms to express their opinions.The huge size of the netizens has left them with a large amount of behavioral data on the Internet,which is embedded in various network platforms.The behavior data left by users on the Internet has guiding significance to the real society.In reality,some indicator data is often not released in time because of the cumbersomeness of its statistical process.At this time,Internet data can reflect the trend of such indicators.The number of cases of influenza has been proven to be more accurate than the search frequency of search engines.In addition,the dynamic data left by the user community on the social network is also used by researchers for prediction.Integrating the multi-source data in the above Internet to improve the accuracy of realistic indicator prediction is the focus of this paper.To take advantage of the huge user behavior data in the Internet environment,you must first filter out the data sources that are predictive of the forecast.After obtaining the data,the data needs to be characterized,and finally the analysis results are used for model training and prediction.Therefore,the main work of the thesis includes the following points:(1)The collection and feature analysis of Internet multi-source data is studied.Taking the non-directly related Internet data of the number of influenza cases as an example,a collection and feature analysis scheme based on Internet multi-source data was designed.The multi-source data mainly refers to search engine data and social network data.This method can well screen the Internet data related to the target topic and obtain its main features.(2)A combined prediction model based on Internet multi-source data is proposed,taking the prediction of the number of influenza-like cases as an example.The model trains the prediction model from different data sources of the Internet,and then uses the GBDT algorithm as a secondary learner for integration based on the obtained prediction results.This model has a better predictive effect than a predictive model that uses only a single data source.(3)Taking the tourist volume of Jiuzhaigou as an example,the application value of the collection analysis plan and the combined forecasting model to other realistic indicator data is proved.In this example,the combined prediction model still outperforms the prediction model using only a single data source.(4)Design and implement an Internet multi-source data acquisition and analysis system.The system is tested.The test results show that the scheme can quickly collect multi-source data and train the model,and can compare and analyze the model's fitting and prediction effects.In summary,the main research content of this paper is to present a multi-source data acquisition and feature analysis scheme based on the Internet.Based on this,a combined forecasting model based on Internet multi-source data is proposed.
Keywords/Search Tags:Internet, multi-source data, feature analysis, trend forecasting
PDF Full Text Request
Related items