Font Size: a A A

Load-Balanced Parallel Strategies For Geospatial Analysis Over Hybrid CPU And GPU Architecture

Posted on:2019-02-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:C ZhoFull Text:PDF
GTID:1310330545975721Subject:Cartography and Geographic Information System
Abstract/Summary:PDF Full Text Request
Geospatial analysis is always a hot topic of research in many fields such as cartography and geographic information systems(GIS)and remote sensing.Along with the rapid development of earth observing technology,the volume of geographical data is becoming increasingly more massive and accordingly the computational process is becoming more complicated and intensive.In such a case,it is evident that the computational efficiency will be a bottleneck when addressing a large-scale geographical data.Considering the wide use of hardware models of central processing units(CPUs)and graphics processing units(GPUs),it is necessary to adopt parallel computing technology to improve the computational efficiency for geospatial analysis.The main challenge in parallel processing is to achieve balanced workloads between different parallel computing units,which always limits further acceleration improvement of parallel efficiency.Therefore,it is urgent to develop novel load-balanced parallel strategies for geospatial analysis over hybrid CPU and GPU architecture.Even though the existing parallel strategies have achieved considerable acceleration,there are still many critical problems that affect further improvement of parallel efficiency.Therefore,more efforts are needed to explore rational methodologies and approaches for improving parallel efficiency.This dissertation focuses on the design and implementation of load-balanced parallel strategies over hybrid CPU and GPU architecture.More specifically,the load-balanced parallel strategies are put forth for vector polygon-based and raster-based geospatial analysis algorithms separately.In addition,a set of load-balanced parallel approaches is presented for hybrid CPU and GPU architecture.Finally,an adaptive model is developed to scale with different types of parallel environments,geospatial analysis algorithms,and geographical datasets.The main contents of this dissertation are as follows.(1)We first present a decomposition method based on polygon complexity(DMPC)for vector polygon-based geospatial analysis.Conventional methods cannot reflect the actual calculations of various polygons,and usually result into serious data skew during parallel processing.To address this issue,this research constructs polygon complexity models in order to optimize the decomposition process of massive polygon data.The procedures of constructing a complexity model includes filtering possible influential indices,determining actual influential indices,and finally constructing the model using regression analysis method.Moreover,polygons may vary significantly in size,shapes and data structures;and such complicated polygon data will result into data skew during parallel processing.For data-intensive algorithms,a complicated polygon is further divided into a collection of points,polylines,or sub-polygons.For computation-intensive algorithms,a complicated polygon group with intersected polygons is further divided into a collection of polygon pairs.Using this approach,data skew during parallel processing can be further alleviated.The above-mentioned methods are applied to parallelize different polygon-based geospatial analysis algorithms,and the implemented parallel algorithms are tested on a parallel cluster with nine computing nodes.Each node in the cluster contains the following hardware configuration:two Intel(R)Xeon(R)CPUs(E5-2620 clocked at 2.00 GHz,six-core model),and 16 GB of memory.Results show that,when compared to parallel algorithms using conventional data decomposition methods,those using DMPC can achieve much shorter execution time,higher speedup ratios,and more consistently balanced workloads.In addition,results suggest that the proposed DMPC can be well applicable for different types of datasets and polygon-based geospatial analysis algorithms.When addressing a polygon dataset with a volume of 5.5 GB,the implemented parallel polygon rasterization algorithm can reduce total execution time from 1668.45 to 86.95 s,achieving an optimal speedup of 19.19.When processing 1,207,826 polygon groups,the implemented parallel polygon intersection algorithm can reduce total time from 1497.24 to 85.97 s,and achieve a speedup of 17.42.(2)We also raise a parallel method based on actual calculations for raster-based geospatial analysis algorithms.Conventional decomposition methods are coarse when decomposing raster data because they do not fully consider the actual calculations in raster-based geospatial analysis.Moreover,existing task scheduling strategies may easily result into blocks between different parallel computing units,turing out to reduce overall parallel efficiency.To address these issues,we design different data decomposition methods and task scheduling methods for local and global raster-based geospatial analysis,respectively.More specifically,the local raster-based geospatial analysis is typically data-intensive;accordingly,an irregular data decomposition method and a multi-granularity based scheduling method are proposed.The global raster-based geospatial analysis is typically calculation-intensive;accordingly,an improved heuristic data decomposition method and a pick-and-transfer based task scheduling method are proposed to manage and schedule computational tasks during parallel processing.Typical parallel algorithms are implemented and tested on a parallel cluster with nine computing nodes.Results show that,the proposed data decomposition methods and parallel scheduling methods can achieve much shorter execution time as compared to conventional ones.In addition,the proposed parallel methods can also achieve higher speedups and continuously better balanced workloads between different parallel computing units.We use the above parallel methods to achieve the parallel k-means classification algorithm and the parallel raster-to-vector algorithm;and also test the parallel efficiency on the parallel cluster with nine computing nodes.When dealing with a remote sensing image with a volume of 6.9 GB,the implemented parallel k-means classification algorithm can reduce total execution time from 2400.28 to 118.42 s,achieving an optimal speedup of 20.27.When processing a land-use classification raster data with a volume of 3.8 GB,the parallel raster-to-vector algorithm can decrease the total time from 1362.36 to 146.65 s,and reach the highest speedup of 9.29.(3)We finally develop an adaptive load-balanced parallel model over hybrid CPU and GPU architecture.Based on above presented parallel methods,we construct an adaptive load-balanced parallel model(LBPM).This model contains five elements,including geographical data,operator,parallel method,data granularity,and parallel computing environment.To scale with different parallel computing environments,we further put forth three parallel methods,including a hybrid method by using a combination of processes and threads over multi-core CPUs,a parallel CPU/GPU collaboration method,and computing capability based data assignment method.Using these approaches,the computational resources of both CPUs and GPUs can be fully utilized,and a large geographical data can be successfully addressed.In addition,we develop a rapid parallelization method for sequential algorithm and an adaptive load-balanced method.Using these developed methods,the LBPM model can adaptively choose parallel methods and data granularities according to the specified datasets,sequential algorithms,and parallel computing environment.Based on LBPM,we ultimately develop a load-balanced parallel platform to run and test all of the parallel algorithms that we have already implemented.Results show that the hybrid parallel method and CPU/GPU collaboration method can achieve appropriate parallel efficiency over multi-core CPUs and hybrid CPU and GPU architecture.In addition,results indicate that the LBPM can be well applicable for different parallel computing environments,different types of algorithms,and different volumes of geographical datasets.In summary,the innovations of this research rely on three perspectives.First,we present novel methods for construsting polygon complexity models and subdividing complicated polygons.Using these approaches,polygon data can be rationally assigned and addressed in parallel.Second,we propose a set of load-balanced parallel techniques with multiple computational granularities for raster-based geospatial analysis algorithms.With the help of these techniques,actual computations can be well decomposed and assigned.Finally,we develop novel load-balanced parallel approaches for hybrid CPU and GPU architecture,so as to address massive geographical data and reduce the time spent on data reads/writes.
Keywords/Search Tags:GIS, geospatial analysis, high-performance geo-computation, spatial data decomposition, workload balance
PDF Full Text Request
Related items