| As urbanization accelerates in China,the air pollution caused by urban construction,production and life is becoming more and more serious.People have suffered greatly as a result of the frequent occurrence of hazy weather,both physically and psychologically.The analysis of air pollution characteristics,as well as the prediction of PM2.5(fine particulate matter)concentrations of major pollutants,will assist people in taking appropriate protective measures ahead of time.The following are the main points of the thesis and the research findings:The selected three principal components were clustered using the Mean Shift algorithm combined with the K-Means algorithm to classify the 113 key cities for environmental protection into three categories after pre-processing and Principal Components Analysis of the data of six air pollutants from 2014 to 2020.The spatial distribution of air quality in the three types of urban agglomerations exhibited a circular decreasing characteristic from south to north,and the same type of cities exhibited geographical agglomeration.The analysis of the mean values of the six air pollutants in the three urban groups revealed that O3(ozone)had a temporal distribution that was "low in winter and high in summer",while the other five air pollutants had a temporal distribution that was"high in winter and low in summer".Descriptive statistics were used to examine the annual,quarterly,and monthly air pollution characteristics of three typical cities(Kunming,Beijing,and Xi’an).Annual distribution characteristics:the annual average values of air pollutants in Xi’an and Beijing were decreasing,and the overall situation of air pollution control was improving.The annual average values of air pollutants in Kunming fluctuated but stayed within the national limit values.Quarterly distribution characteristics:summer air quality was the best in all three cities,while winter air pollution was worse in Xi’an and Beijing,with PM2.5,SO2(sulfur dioxide),and CO(carbon monoxide)as the main pollutants,and there was light pollution in Kunming in all other seasons.Monthly distribution characteristics:The distribution of O3 was "inverted U" shaped,with peaks from May to September and troughs from November to February of the following year,whereas the distribution of other air pollutants was the inverse of O3.The correlation analysis was performed using the air pollutants and meteorological factor of three typical cities from 2014 to 2020 as characteristic variables and the target variable PM2.5 as the target variable,and the degree of correlation between PM2.5 and characteristic variables was obtained.In this study,the thesis chose the Gradient Boosting Decision Tree model in machine learning,the Linear Support Vector Machine model,combined these two models with Lasso,were used in this thesis to predict the daily average PM2.5 value of a typical city.The test results of the three models showed that the Root Mean Squared Error and the Mean Absolute Error of the models on the test data set were ranked as followed:Linear Support Vector Machine>Gradient Boosting Decision Tree>Stacking,but the ranking of the coefficient of determination on the test dataset was reverse.It could be concluded that in the prediction of daily average PM2.5 values,the prediction effect of Stacking ensemble model was better. |