Study On Data Fusion Of The Large Scale Carbon Cycle Model Based On Spark

Posted on:2019-03-21

Degree:Master

Type:Thesis

Country:China

Candidate:L J He

Full Text:PDF

GTID:2428330569496537

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The carbon cycle of terrestrial ecosystems is a complex process,and its mechanism often contains a number of parameters that are difficult to estimate directly.Model data fusion technology is an important means to accurately estimate model parameters.It is significant to reduce parameter uncertainty and improve model simulation accuracy.Therefore,model data fusion has become a research hotspot in the ecosystem carbon cycle.With the arrival of the data-intensive era in the ecological field,the explosive growth of data,the traditional computing platform is difficult to meet the needs of rapid analysis.How to accurately and efficiently model data fusion becomes a challenging problem in large-scale scenarios.With studies and analysis of the HDFS,Spark and the probabilistic programming framework,this paper proposes a large-scale carbon cycle model data fusion method based on Spark,mainly including three contents as follows.(1)Based on the research of model data fusion,this paper presents a method for parameter optimization of the ecosystem model,which is based on the Bayesian machine learning and called NUTS.The method avoids redundancy caused by random walk and thus improves the efficiency of parameter optimization.(2)For traditional computing platforms,such issues as small storage capacity,low resource utilization,and frequent disk input and output interactions,this paper proposes a large-scale carbon cycle model data fusion method based on Spark,which can accurately and efficiently realize the data fusion of carbon cycle model under large-scale scenes.(3)With studies and analysis of the Spark,this paper proposes a new optimization scheme based on the configuration parameters,which makes full use of cluster resources and reduces cost.Taking the Chinese forest ecosystem as an example,this paper proposes a large-scale carbon cycle data fusion method based on Spark,and realizes parameter optimization and process simulation of the DALEC carbon cycle model.The results show that single-site and large-scale optimization works well and both meet the expected results and expert information.Compared with single-system and MPI,the calculation efficiency of the Spark configuration with default parameters increases by 74.9% and 55.9% respectively.The Spark can effectively improve data processing efficiency.Compared with the single-system,MPI,and Spark configuration default parameters,the computational efficiency of the parameter-tuned Spark has improved by 93.3%,88.3%,and 29.7%,respectively.In summary,the large-scale carbon cycle model data fusion based on Spark,on the basis of ensuring the optimization effect of parameters,has a stronger computing capacity,and reflects a powerful data processing capability.

Keywords/Search Tags:

Spark, Large scale, Model Data Fusion, parameters estimation

PDF Full Text Request

Related items

1	Research On Parameters Estimation Of Wireless Channel Sounding
2	Research Of Large-scale Data Mining Technology Based On Spark
3	Design And Implementation Of Spectral Clustering Algorithm For Large Scale Data
4	Fast Analysis Of Large-scale Wafer Inspection Data
5	Study On Three-way Decisions Clustering Ensemble Based On Spark
6	The Analysis And Monitoring Of Data Models In Different E-commerce Rule Engines
7	Large-Scale, Low-Latency State Estimation Of Cyberphysical Systems With An Application To Traffic Estimation
8	Training Large-Scale Statistical Machine Translation Models On Spark
9	Research On Large-scale Complex Network Community Detection Algorithm Based On Spark
10	Research On Large-scale Traffic Classification Technology Based On Spark Performance Optimization