With the economic development of countries all over the world,human impact on living environment is deepening,which is reflected in environmental pollution,and an important branch source of environmental pollution is greenhouse gas emission.Greenhouse gas time series data can be obtained from satellite remote sensing,stations and other information.Since these time series data are updated rapidly and the data volume is huge,in order to cope with the severe situation of global climate change,how to process,store and data mining analysis of these massive data and provide guiding suggestions for environmental pollution prevention and control becomes the focus.In this paper,we solve the problem of processing and storing time series of greenhouse gases by combining the distributed storage technology of big data;we use data mining algorithm to break the problem of "data island" of time series data;finally,we design and implement the visualization platform of big data mining and analysis of greenhouse gases by using the above analysis content and PyQt technology to solve the application problem.The main work is as follows The main contents of the work are as follows.(1)Greenhouse gas time-series data processing and storage and mining analysis.GHG time-series data processing,storage and analysis.The multi-source heterogeneous water vapor and carbon dioxide time-series data files are read,cleaned,integrated and converted;the processed massive data are stored and optimized using HBase database;finally,the data mining analysis methods such as Mann-Kendall trend analysis and EOF modal decomposition method are used on the above time-series data for spatial and temporal feature mining and discussion of influencing factors.The analysis results show that there are sudden change points in the interannual variation of water vapor and carbon dioxide in China,which show different distributions under the influence of different latitudes,air pressure layers and climate environments,and have different strong and weak correlations with surface temperature factors respectively.(2)GHG time-series data estimation and combined prediction model design.A combined SARIMA-At-LSTM prediction method is proposed,in which the SARIMA model is used to predict the linear part of the time-series data and derive the residuals between the predicted and true values,and then passed into the LSTM model with the attention mechanism to derive the residual predictions.The results are compared and analyzed between the combined model and the individual models.The experimental results show that the improved prediction model has higher prediction accuracy for long-term GHG time series data,and the fitting coefficient is as high as 0.989,which can estimate the future trend changes of time series data well.(3)A visualization platform of greenhouse gas big data mining analysis based on data mining algorithm is constructed.Firstly,the overall requirements of the system and the overall module design are planned,and the three-layer architecture design of data layer,business layer and application layer is adopted,which is divided into data processing and storage module,time-series mining and analysis module and data prediction and analysis module;then the signal slot mechanism,multi-threading and other technologies are used for the functional implementation of these modules;finally,the data mining and analysis algorithm combined with the real-time requirements of the visualization platform for time-series data is realized,and the data fusion,data analysis mining and other functional features.Figure [54] Table [16] Reference [68]... |