Font Size: a A A

Research On Storage And Query Processing Of Spatio-temporal Data Based On HBase

Posted on:2020-11-08Degree:MasterType:Thesis
Country:ChinaCandidate:F QiuFull Text:PDF
GTID:2428330602451893Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The development of informatization in various industries has led to the explosive growth of data;the rapid spread of mobile devices has accelerated the generation of spatio-temporal data;advances in hardware and data mining have enhanced the ability to analyze spatiotemporal data.All of those tendency require more efficient storage and query methods for large-scale spatio-temporal data.Traditional relational databases have disadvantages in horizontal expansion and are not suitable for large-scale data storage.Distributed systems can better process and store largescale data by using the computing and storage capabilities of the whole cluster,such as existing Hadoop,HBase,etc.They have the ability to process and store large-scale data,but they do not provide direct support for the storage and management of spatio-temporal data.In view of the above problems,this thesis studies the related content of existing spatiotemporal data storage,designs LPST-Hash,and implements a prototype system based on HBase.The prototype system realizes near real-time insertion and batch data import of spatio-temporal data,range query and kNN query with high efficiency.The main contents of this thesis are as follows:(1)It studies the existing spatio-temporal data storage scheme.According to the characteristics of spatio-temporal data,the spatio-temporal data is divided into different levels in the time dimension.Different types of levels correspond to the different granularity of the time dimension,so as to accelerate the query of spatio-temporal data.(2)It studies and analyzes the characteristics of the existing space filling curve and describes its existing problems: the performance impact caused by uneven data distribution.And it gives a solution: dividing the data into partitions according to the data density.In the dataintensive area,the granularity of the partition will be finer,and in the sparse area,the granularity of the partition will be coarser,so as to reduce the performance impact caused by uneven data distribution.LPST-Hash(Leveled Partitioned Spatio-Temporal Hash)is designed by synthesizing point 1 and 2.LPST-Hash can better cope with different time scales in queries and reduce the performance impact of uneven data distribution.(3)According to the above-mentioned LPST-Hash,which combines leveling and partitioning,and based on HBase,a reasonable table primary key and column family are designed.Combining with HBase's coprocessor,this thesis realizes the storage of spatiotemporal data and the construction of spatio-temporal index,it provides real-time and batch data import methods.Based on the above-mentioned spatio-temporal index,the range query and kNN query of spatio-temporal data are realized.Finally,the prototype system is deployed in the laboratory environment.Based on the GDELT dataset from April 2013 to December 2018 and the data of 100 million normal distributions generated by simulation,it selects an open source spatiotemporal data management tool GeoMesa as the control group for the experiment.The experimental results show that the LPST-Hash is superior to GeoMesa in both range query and kNN query for large-scale spatio-temporal data.
Keywords/Search Tags:spatio-temporal query, spatio-temporal index, HBase, LPST-Hash
PDF Full Text Request
Related items