| GML is largely used because of its simplicity, half-strutted, interoperability, openness, generality and flexibility etc. As the GIS problems become more and more complex and the scale become larger, the traditional GIS spatial data storage and spatial analyze algorithm cannot meet the need of mass data storage and spatial analyze. However, this problem can perfectly solved by a computing model which is called the distributed and parallel computing. The performance of the distributed and parallel computing is mostly depended on the strategy of the data partitioning while the current data partitioning algorithms does not take the spatial relationships to be concerned. Therefore, this paper researches on some spatial data partitioning algorithm that is appropriate for GML, concerning data balance on each node, adjacency of spatial objects, area balance and spatial relationships, and some innovative archives are reached as follows:First of all, point out the disadvantage of the spatial data portioning based on the Hilbert curve and K-means clustering algorithm. The former performs not well on balance of the area for spatial data on each node and the latter may get a bad result for the sake of a bad initial centroid.Secondary,combining the Hilbert curve and K-means clustering algorithm, propose a new GML data partitioning algorithm, which takes the load balance, adjacency, area balance and spatial relationships into consideration.Finally, based on the algorithm proposed, designed the GML distributed storage system, and finished the data partitioning module of distributed parallel GML storage system based on Hadoop platform. Verified the data balance on each node. Compared the Concurrency Accelerator Ratio of this algorithm with those based on Oracle Spatial or K-means clustering algorithm. Compared the commensurate area query time of this algorithm with that based on the Hilbert curve. The resultproved that this data partitioning algorithm has a good load balance and parallel query efficiency. |