Font Size: a A A

Research On Key Technologies Of Spatial Data Visualization With Spark

Posted on:2019-09-30Degree:MasterType:Thesis
Country:ChinaCandidate:X J XiFull Text:PDF
GTID:2428330572455621Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The rapid development of information technology leads the human society into the digital information era.As the scale of data is getting larger,the period of massive data explosion is coming throughout the world.Among the large number of emerging data,spatial data shares a high proportion,however,plenty of spatial data cannot be understood directly by users.Visualization of the spatial data is the best way to demonstrate the value of data.The needs of visualization are still difficult to meet even with high-performance computers in face of gigantic amounts of spatial data.In view of the above problem,we study in key technologies of spatial data visualization related fields,include data preprocessing,data storage,data partitioning,spatial statistics image generation,and space clustering image generation based on Spark.The specific work is as follows:(1)First,we analyze spatial data sources.According to the characteristics of spatial data sets,the sets are divided into two kinds: human spatial data and natural spatial data,for each type of data,we select one data set,GDELT for human spatial data and LAADS DAAC for natural spatial data.Then we study their distribution characteristics and composition rules,design corresponding visualization programs of each set.(2)We study the storage of spatial data under the distributed platform by referring to the visual image generation method.Two kinds of spatial data storage schemes based on the HDFS is proposed and designed: the default file block hierarchical storage and the spatial data partition storage.The default file block hierarchical storage mode is equivalent to a layer for generating a target image by each compute node,and the final visualized image is generated by layers superposition.The spatial data partition storage mode is equivalent to one part of the target image generated by each compute node,and the final visual image is generated by blocks splicing.(3)When spatial data is partitioned and stored on a distributed cluster,it is necessary to ensure the evenness of the data among nodes to avoid data skew,which may reduce the efficiency of the Spark cluster operation.Under these circumstances,we design and implement two homogeneous data partitioning algorithms based on Spark: Hilbert_On_Spark and Merge Re Partition,which can improve efficiency of visualization on clusters.(4)We propose a universal visualization algorithm model based on Spark.According to the model,three specific visualization algorithms: scatter plots,frequency plots and heat maps are designed and implemented.Some problems arise while plotting maps,such as discrete data points,data missing and skipped color values,we design and implement some optimization operations,including data compression,impact factors,missing data repair,color value normalization,etc.(5)We realize DBSCAN spatial clustering visualization algorithm based on Spark.By leveraging the memory-based iterative performance advantages of Spark,we optimize traditional spatial clustering algorithm that relies on the memory performance of singlemachine computing devices.We design and implement DBSCAN_On_Spark algorithm to speed up the clustering process,and realize the visualization algorithm DBSCANMap at the same time.Finally,we establish the test environment on both local single machine and clusters.Based on GDELT,LAADS DAAC and simulated data sets,we evaluate the performance of the optimized solution,clustering operations,and visualization algorithm efficiency through a number of experiments.The test results show that a variety of visual optimization schemes could effectively improve the output image quality;the cluster visualization based on Spark can complete the visualization that cannot be achieved by stand-alone devices;the efficiency of the general visualization algorithm based on Spark is superior to Spatial Hadoop and Arc GIS systems.
Keywords/Search Tags:Spark, Visualization, Spatial Data, HDFS, DBSCAN
PDF Full Text Request
Related items