Font Size: a A A

Study On Data Fusion Of The Large Scale Carbon Cycle Model Based On Spark

Posted on:2019-03-21Degree:MasterType:Thesis
Country:ChinaCandidate:L J HeFull Text:PDF
GTID:2428330569496537Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The carbon cycle of terrestrial ecosystems is a complex process,and its mechanism often contains a number of parameters that are difficult to estimate directly.Model data fusion technology is an important means to accurately estimate model parameters.It is significant to reduce parameter uncertainty and improve model simulation accuracy.Therefore,model data fusion has become a research hotspot in the ecosystem carbon cycle.With the arrival of the data-intensive era in the ecological field,the explosive growth of data,the traditional computing platform is difficult to meet the needs of rapid analysis.How to accurately and efficiently model data fusion becomes a challenging problem in large-scale scenarios.With studies and analysis of the HDFS,Spark and the probabilistic programming framework,this paper proposes a large-scale carbon cycle model data fusion method based on Spark,mainly including three contents as follows.(1)Based on the research of model data fusion,this paper presents a method for parameter optimization of the ecosystem model,which is based on the Bayesian machine learning and called NUTS.The method avoids redundancy caused by random walk and thus improves the efficiency of parameter optimization.(2)For traditional computing platforms,such issues as small storage capacity,low resource utilization,and frequent disk input and output interactions,this paper proposes a large-scale carbon cycle model data fusion method based on Spark,which can accurately and efficiently realize the data fusion of carbon cycle model under large-scale scenes.(3)With studies and analysis of the Spark,this paper proposes a new optimization scheme based on the configuration parameters,which makes full use of cluster resources and reduces cost.Taking the Chinese forest ecosystem as an example,this paper proposes a large-scale carbon cycle data fusion method based on Spark,and realizes parameter optimization and process simulation of the DALEC carbon cycle model.The results show that single-site and large-scale optimization works well and both meet the expected results and expert information.Compared with single-system and MPI,the calculation efficiency of the Spark configuration with default parameters increases by 74.9% and 55.9% respectively.The Spark can effectively improve data processing efficiency.Compared with the single-system,MPI,and Spark configuration default parameters,the computational efficiency of the parameter-tuned Spark has improved by 93.3%,88.3%,and 29.7%,respectively.In summary,the large-scale carbon cycle model data fusion based on Spark,on the basis of ensuring the optimization effect of parameters,has a stronger computing capacity,and reflects a powerful data processing capability.
Keywords/Search Tags:Spark, Large scale, Model Data Fusion, parameters estimation
PDF Full Text Request
Related items