Font Size: a A A

Hadoop-based Geospatial Data Storage And Query Technology

Posted on:2018-12-09Degree:MasterType:Thesis
Country:ChinaCandidate:S J ZhangFull Text:PDF
GTID:2348330521950950Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Nowadays with the rapid development of geographic information technology,the amount of geospatial data has increased rapidly,and the traditional storage and processing methods have difficulties in meeting the demands.How to store and process massive geospatial data efficiently is becoming the focus of major IT companies and well-known academic institutions.In this background,this thesis deeply studies the geospatial data storage and query technology based on Hadoop,which can resolve the following problems efficiently,including complex structure of geospatial data,large data volume,prominent spatial features and complicated storage and processing.The main contents and innovations of this thesis are as follows:(1)This thesis designs a new HBase table pattern suitable for vector spatial data storage and processing,considering the structural characteristics of vector spatial data and the storage and retrieval characteristics of No SQL database HBase.The row and column families in HBase table are specially designed to meet the requirements of vector spatial data storage and query.(2)A variety of data import algorithms are proposed,including two stand-alone import algorithms and two distributed import algorithms,which can satisfy the import requirements of multiple application scenarios.This technology supports the import of Shapefile files by default,and can be easily extended to support other vector spatial data formats.(3)This thesis designs an R-Tree index storage model which can speed up the query processing of vector spatial data.The model is based on HDFS and contains a two-level index structure.At the same time,this thesis designes and implements a variety of optimization strategies to reduce the query delay.(4)Through in-depth analysis of R-Tree index,this thesis proposes a data sampling algorithm and an R-Tree index creating algorithm based on Map Reduce,and the storage of R-Tree index is optimized to speed up the query processing.(5)Based on the study of HBase coprocessor,the range query,K-nearest neighbor query and spatial join query algorithm based on R-Tree index and HBase database are proposed,and the corresponding batch query optimization algorithm is designed respectively.This thesis makes a performance comparison test and analysis in the laboratory environment based on the above design,which shows that the Hadoop-based geospatial data storage and query technology can provide a better query performance for the massive geospatial data.
Keywords/Search Tags:Range query, K-nearest neighbor query, Spatial join query, R-Tree index, Hadoop, Geospatial data
PDF Full Text Request
Related items