Research On PM2.5 Concentration Model Based On Random Forest

Posted on:2019-07-06

Degree:Master

Type:Thesis

Country:China

Candidate:X Du

Full Text:PDF

GTID:2371330545464161

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

PM2.5 particulates,as the main hazardous component of smog,not only seriously threaten human health and destroy natural environment,but also have a major impact on China’s economic development.Scientific and accurate prediction of PM2.5 will help the environmental protection department to formulate corresponding preventive and remedial measures.It will also provide scientific basis for the government’s policies and reduce the harm to the human body.In this paper,the research progress and prediction methods of PM2.5 are sorted and analyzed.Based on this,combined with machine learning theory and statistical forecasting methods,a new PM2.5 concentration prediction model(RFP model)was established to predict PM2.5 average daily concentration based on the random forest algorithm.The main work done in this paper is as follows:(1)Select the Xi’an area with high concentration of PM2.5 as the research object.Based on the Python language and the Scrapy framework,design the five functional modules of the crawler and realize the automatic crawling of Xi’an from multiple websites.Historical data from October 28,2013 to January 31,2018,includes air pollutant concentrations(PM2.5,PM10,SO2,NO2,CO,O3),meteorological conditions(temperature,dew point,humidity,sea level pressure,visibility,wind speed,wind direction,wind force,weather conditions).This paper uses Newton interpolation method,3 δcriteria,before and after average correction method,one-hot coding and other techniques to do a lot of pre-processing of the original data,thereby improving the quality of PM2.5experimental data.On this basis,a high-quality training data set specifically designed for PM2.5 concentration prediction research was constructed.(2)Using statistical theory,qualitatively analyzes and display the magnitude and direction of correlation between PM2.5 and influencing factors from correlation coefficients(including analysis of variance)and visualization.Through exploratory analysis,it was proved that the seasonal(spring,summer,autumn and winter),atmospheric pollutant concentration and meteorological conditions on the first 3 days affected the PM2.5 concentration on the day.Through correlation analysis,it provides the data and characteristic basis for the establishment of the model,and also provides reference and theoretical basis for the formation,source,and influencing factors of PM2.5.(3)Based on the correlation coefficient method in filtering method,the preliminary selection of features was performed.A total of 34 highly relevant features were selected toestablish the RFP-M1 and RFP-M2 models respectively.Based on the random forest method in the packaging method,17 features were further screened and the RFP-M3 model was established.Based on the grid search algorithm and cross-validation method to optimize the parameter combination,the RFP-M4 model was built.The performance of the four models was analyzed and compared.Finally,from the principle and method,the RFP model is compared with the BP-NN(Back Propagation Neural Network)model,and the prediction results are compared with other algorithms,including linear regression(LR),decision tree(DT),support vector machine(SVM).The experimental results show that the proposed RFP model not only can effectively predict the PM2.5 concentration,but also can improve the model’s operating efficiency without affecting the prediction accuracy,accounting for only 2.1% of the BP-NN model.

Keywords/Search Tags:

PM2.5 concentration prediction, random forest, Scrapy, correlation analysis, BP neural network

PDF Full Text Request

Related items

1	PM2.5 Concentration Prediction Based On Space-time Mixed Model
2	Research On PM_2.5 Concentration Prediction Method Based On CART-LSTM
3	Prediction Of PM2.5 Concentration In Beijing
4	Air Quality Forecasting Using BP Neural Network And Random Forest Model
5	Study On PM 10 Concentration Prediction Of Cement Commercial Concrete Based On Data Mining
6	Urban PM_2.5 Concentration Prediction Based On Parallel Random Forest
7	Estimation Of PM_2.5 Concentration And Analysis Of Spatial And Temporal Distribution Based On AOD Data
8	Research On Dust Concentration Prediction Of Open-Pit Mine Based On Random Forest-Markov Model
9	Research On Application Of Improved QPSO_RBF Neural Network In Air Quality Forecast
10	Construction Of Data Driven Shale Gas Productivity Forecast Model Based On Deep Neural Network