In mid December 2019,a number of patients with "unexplained pneumonia" began to appear in Wuhan.The disease was caused by a New Coronavirus.After that,New Coronavirus was named "COVID-19".The epidemic spread to many provinces at a very fast speed,and many regions have launched "level I response to major public health emergencies".In addition,since March 2020,outbreaks have occurred in many other countries,and the global pandemic situation is serious.Therefore,according to the relevant data of the epidemic,accurately predicting the future trend of the epidemic is of great significance for government departments to formulate scientific and reasonable epidemic prevention and control policies.At present,most novel coronavirus pneumonia epidemic trend prediction models are infectious disease dynamics models,and the SEIR model and its expansion model are the main ones.Due to the different parameters and the different stages of the prediction,the prediction results of the model are far from the prediction.Based on the descriptive statistical analysis of the epidemic data in China and the United States,according to the actual development of the epidemic situation and the characteristics of the data,this paper selects the epidemic data in the United States as the research object,establishes the prediction model,and uses the epidemic data in Maryland for empirical analysis,in order to find a more accurate prediction method.First of all,in the second chapter,using the classical dynamic model of infectious diseases,the SIRD model is established.With the global search advantage of genetic algorithm,the optimal solution of parameters is obtained.The results show that the fitting effect of the model in the training set is good,but the deviation between the predicted trend and the real situation is large.Considering that the actual situation of the epidemic situation can not meet the basic assumptions of the model,the prediction model based on machine learning algorithm is studied.This novel coronavirus pneumonia epidemic prediction model is established in the third chapter by using various machine learning algorithms.The number of new confirmed cases is daily dependent variable.The number of days of data lag and the days and weeks of outbreak are selected as independent variables.RFECV function is used to select the feature and the average absolute error(MAE)is used to measure the prediction accuracy of the model.Through comprehensive analysis of various aspects,the order of prediction effect of the four models is:linear regression model>random forest model>XGBoost model>support vector regression model.The rolling prediction method is added to optimize the prediction ability of the model.The results show that the prediction results of the four machine learning models are improved in varying degrees,XGBoost model is the best,the error of the model is reduced by 39.69%,and has ideal prediction accuracy.Furthermore,the neural network prediction model based on long-term and short-term memory network structure is established in Chapter 4.The results show that the prediction accuracy of the deep learning algorithm is better than the four machine learning algorithms.Finally,the weighted combination method is used to optimize the model,and it is found that the prediction effect of the five weighted combination models is not good,and the two weighted combination models combined by linear regression model and LSTM model have better optimization effect and the highest prediction accuracy. |