In recent years,tourism has developed rapidly and is an important part of China's national economy.At the same time of industrial development,problems such as passenger retention in tourist areas,overcrowding in scenic spots and unreasonable allocation of resources in off-season and peak seasons are also emerging.In order to allocate tourism resources reasonably and promote the healthy,steady and sustainable development of tourism,it is particularly important to make effective short-term prediction of tourist flow.However,traditional tourist flow prediction usually adopts historical data to establish time series model,which has a strong lag and entirely relies on historical data.In the era of Internet development,the network search data has the advantages of timely release,fast update speed and so on.This paper takes Sanya city and Guilin city as the research object,focuses on the Baidu search index platform,establishes the prediction model of tourist traffic based on the Internet search data,and uses the Internet search data of the current month to predict the tourist traffic of the next month.First of all,this paper through Baidu related recommendation function after many iterations to get the keywords.The daily Baidu search index data of keywords were crawled by crawler method,and the monthly search volume of each keyword was accumulated by month.Many keywords have weak correlation with tourist flow,and do not have good prediction ability.This paper selects core keywords based on Dynamic Time Warping method.The core keywords of the same attribute are added together to form a search index,and the synthesized search indexes are the variable used to predict.Compared with the Pearson coefficient method used in previous studies,the Dynamic Time Warping method can better deal with the deformation on the time axis,and is compatible with the abnormal points and not easy to miss keywords.In view of the existence of noise in the network search data,this paper introduces the Empirical Mode Decomposition method for noise reduction.Different from the unified noise reduction processing by synthesizing one index,this paper uses Empirical Mode Decomposition method to decompose the data of each search index separately,which can avoid noise interference between different search indexes to some extent and improve the accuracy of prediction.Secondly,considering the non-linear characteristics of the network search data,this paper uses the Support Vector Regression method to train the training set samples and establish the tourist traffic prediction model.The insensitivity loss function,penalty degree and kernel function width in Support Vector Regression method are optimized by grid search method.The model obtained is used to solve the test set,and the predicted value is obtained.MAE,MAPE and MSE are introduced to evaluate the prediction effect of the model.In addition,in order to make the analysis results more convincing,this paper divided training sets and test sets for three times for each city,obtained three experimental results,and analyzed the results of each experiment.In order to verify the scientificity and validity of the EMD-SVR model,this paper compares the model with linear regression model,SVR model and BP neural network model,and draws the comparison curve of the prediction results of the model.The result shows that the EMD-SVR model is the best and its prediction accuracy is the best.The research results of this paper show that the search behavior is a symbol of tourists' willingness to make tourism decisions,and the network search behavior has a certain prediction effect on the economic industry,especially the tourism industry,which provides a new perspective for the research of tourist flow prediction.Network search data can be used to predict tourist traffic,and noise reduction processing of network search data can improve the accuracy of prediction.Support vector regression model has certain advantages in dealing with nonlinear small sample prediction problems. |