Font Size: a A A

A Research Of Distributed Storage And Parallel Query Of Spatial Data Based On Hadoop Platform

Posted on:2017-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:J X ChenFull Text:PDF
GTID:2308330485488312Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
At present, the scale of spatial data continues to grow, so the processing and analysis of spatial data technologies increasingly difficult, with the wide application of GIS in every industry, making the massive spatial data management and processing efficiently is becoming increasingly difficult, also most areas came up with increasingly high demands on the accuracy of spatial data. Thus, require new techniques and methods to manage and process massive spatial data urgently. Fortunately, the current source of large distributed data processing platform utilizes a distributed storage and parallel computing method can provide a new method for solving the above problems.Based on Hadoop, the open source big data processing platform, this thesis using HBase advantage concurrent access and data processing spatial data efficiently, reasonable storage and management of spatial data. This thesis take more in-depth studies were made in the store and query spatial data. Firstly, research on the present situation of spatial data storage and parallel processing, then research on the MapReduce parallel computing framework and operational mechanism of the storage model principle of HBase.Base on rowkey designed we put forword a new HBase Schema in this paper for storing and processing spatial data and cancel the filter column family design. Then we make Shapefile into HBase table by GeoTools tools to build a spatial data objects, analyze the GeoTools tools into the process under MapReduce parallel computing framework for spatial data processing. Finally, on this basis, we put the proposed spatial data query window, polygon area query parallel solutions by MapReduce and based Geohash spatial data K neighbor query algorithm. Experimental results show that query algorithm has higher efficiency and better accuracy and the advantages of spatial data storage and processing by HBase. Main works of this thesis include:(1) Base on rowkey designed, a new HBase Schema is designed in this paper for storing and processing spatial data.(2) GeoTools Tools is used to analyze Shapefile-based spatial data in the algorithm. Through this way to complete the importing and exporting of spatial data.(3) Using MapReduce 2.0 parallel programming framework to complete spatial data query window, polygon range query, the experiment proved that the parallel algorithm comparison with the traditional query algorithm has obvious advantages.(4) K neighbor query algorithm is proposed based on spatial data by the Geohash, using this algorithm to complete the K nearest neighbors spatial objects retrieved within the region.
Keywords/Search Tags:Massive Spatial Data, Distributed storage and Parallel Computing, HBase Schema, MapReduce, K-nearest neighbor query
PDF Full Text Request
Related items