In recent years,the problem of environmental pollution has become the focus of people’s attention.Haze always affects people’s health and quality of life.Therefore,the establishment of early warning system of air pollution has become an important demand.To explore the influencing factors of PM2.5 and master the change law of PM2.5has become the research focus of domestic researchers.In this paper,different from the traditional PM2.5 prediction based on single source and single structure data,a method of air quality prediction based on multi-source heterogeneous data fusion is proposed,The air quality prediction model of SVR based on the improved Sequential Minimal Optimization(SMO-SVR)is established,and compared with the SVR optimized by Particle Swarm Optimization algorithm(PSO-SVR)and Genetic Algorithm(GA-SVR)through experimental simulation.The main research contents of this paper include the following aspects:(1)PM2.5 related image feature extraction.By analyzing the influence of PM2.5 in the atmosphere on the image,studying the principle of digital image and reading the relevant literature,we extracted five features including spatial contrast,dark channel intensity and HIS color spatial difference(three dimensions)from the image transformation features,algebraic features,texture features and color features,and verified the correlation between image features and PM2.5 concentration.(2)Image-based air quality prediction method.Image-based PM2.5 prediction is a field of PM2.5 prediction technology developed in recent years.In order to verify the feasibility of image-based PM2.5 prediction,we first extract image features,and then establish a mixed kernel Support Vector Regression model with Column-Generation(CG-SVR)optimization.In this model,a mixed kernel matrix based on linear kernel function,polynomial kernel function and RBF kernel function is constructed.The Column-Generation algorithm continuously selects columns from the kernel matrix to optimize the model parameters until the model accuracy meets the requirements.(3)Multi-source heterogeneous data fusion technology.Multi-source heterogeneous data fusion is a data processing technology that integrates data from multiple data sources with different structures.It can realize information fusion of multi-source heterogeneous data.Information complementation can increase data confidence,improve reliability and reduce uncertainty.The regression prediction model based on multi-source heterogeneous data can obtain more comprehensive estimation and decision.We study the multi-source heterogeneous data fusion based on the multi-kernel learning method.The multi-kernel learning method uses different kernel functions to map different features to the same feature space,realizes the feature space fusion of heterogeneous data,and adds regression model to the upper layer of the fusion feature space to realize the prediction function(4)A multi-kernel SVR prediction model optimized by improved SMO algorithm is established.Combining SVR with multi-kernel extension method,the SVR model based on extended composite kernel is established.Considering the learning ability and generalization performance of the prediction model,polynomial kernel and RBF kernel are selected to construct the extended composite kernel.By constructing the extended composite kernel matrix,the general SVR model is extended to a regression model which can be predicted based on multi-source heterogeneous data.The extended composite kernel matrix keeps all the information of the original kernel matrix and avoids information loss.Finally,the improved SMO algorithm is used to optimize the parameters,so that the convergence speed of the improved model is significantly increased. |