Distributed Parallel Computing Environment Of Gml Spatial Data Partitioning Strategy And Algorithm Research

Posted on:2013-01-25

Degree:Master

Type:Thesis

Country:China

Candidate:Y C Peng

Full Text:PDF

GTID:2240330377453460

Subject:Cartography and Geographic Information System

Abstract/Summary:

GML is largely used because of its simplicity, half-strutted, interoperability, openness, generality and flexibility etc. As the GIS problems become more and more complex and the scale become larger, the traditional GIS spatial data storage and spatial analyze algorithm cannot meet the need of mass data storage and spatial analyze. However, this problem can perfectly solved by a computing model which is called the distributed and parallel computing. The performance of the distributed and parallel computing is mostly depended on the strategy of the data partitioning while the current data partitioning algorithms does not take the spatial relationships to be concerned. Therefore, this paper researches on some spatial data partitioning algorithm that is appropriate for GML, concerning data balance on each node, adjacency of spatial objects, area balance and spatial relationships, and some innovative archives are reached as follows:First of all, point out the disadvantage of the spatial data portioning based on the Hilbert curve and K-means clustering algorithm. The former performs not well on balance of the area for spatial data on each node and the latter may get a bad result for the sake of a bad initial centroid.Secondary,combining the Hilbert curve and K-means clustering algorithm, propose a new GML data partitioning algorithm, which takes the load balance, adjacency, area balance and spatial relationships into consideration.Finally, based on the algorithm proposed, designed the GML distributed storage system, and finished the data partitioning module of distributed parallel GML storage system based on Hadoop platform. Verified the data balance on each node. Compared the Concurrency Accelerator Ratio of this algorithm with those based on Oracle Spatial or K-means clustering algorithm. Compared the commensurate area query time of this algorithm with that based on the Hilbert curve. The resultproved that this data partitioning algorithm has a good load balance and parallel query efficiency.

Keywords/Search Tags:

GML, distributed computing, parallel computing, data partitioning, Hadoop

Related items

1	Marine Environmental Numerical Prediction Data Processing Method Based On The Construction Of Synergistic And Parallel Computing
2	The Key Techniques Of Cloud GIS Based On Hadoop
3	Research On Reverse Time Migration Data Processing Method Based On Cloud Computing
4	Integration And Development Of Natural Resource Spatial Data Application Platform Based On Hadoop
5	Research On Graph Partitioning In Distributed Graph Computing
6	Research On Key Technologies In Geographic Information Management Based On Hadoop
7	Research On Parallel Computing And Remote Sensing Data Generation Method For Distributed Hydrological Simulation
8	Research On Distributed Computing Of Raster Big Data Based On GeoTrellis
9	Study On The Suitability Of Data Partitioning Granularity For Parallel Computing Of POIs Generalization
10	Could Computing Model Base On Hadoop And Meteorological Application