Font Size: a A A

Research On Storage And Indexing Of Spatiotemporal Big Data Based On HBase Database

Posted on:2022-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y L JiangFull Text:PDF
GTID:2510306524450154Subject:Surveying and Mapping project
Abstract/Summary:PDF Full Text Request
With the development and popularization of Internet,PC technology and sensor technology,more and more high-level technologies and multi-field applications will involve geographic information data,such as digital city,smart city,map navigation,etc.With the diversification of information world,with the demand of users and the pursuit of data accuracy,the amount of data also shows a significant increase.Therefore,how to organize and manage large amount of spatial data efficiently is a very important problem.With the birth of cloud computing technology,the fast storage and efficient computing of massive data are realized.This research focuses on the distributed storage based on spark technology framework under different pre regions,and improves the Hilbert curve coding based on the original one.A multilayer adaptive Hilbert curve coding is proposed.And the data stored in the database of HBase is set up with a secondary index table to retrieve the data.The research is as follows:(1)This paper first elaborates the background,current situation and research basis of the research,and introduces the related technology and theory of this paper in detail.On this basis,the feasibility of this study on massive data storage and index is analyzed in detail,and the necessary theoretical and technical support is provided for the follow-up research.(2)In the characteristics of the overall structure of big data,combined with the characteristics of spatial-temporal data aggregation,the paper analyzes the advantages and disadvantages of common data partition curves.In this paper,the innovation of this paper is to improve the traditional encoding method based on Hilbert curve,and design an adaptive Hilbert coding method at all levels,including classification code and sorting code,Based on the rowkey design of HBase database and the design of column family and pre partition,the storage efficiency of 2W,20 W,100W,500 W,1000W,1500 W and 2000 W data under a single pre partition is compared with the storage efficiency of data in 10,51(calculated according to the formula provided by the official),10 W,100W,500 W under 100 pre partition conditions Whether the efficiency of storage is regular under 1000 W data and the number of pre partition calculated by the formula provided by the official is the best number of pre partition.The experiment shows that the higher the storage efficiency is,the more the number of pre partition is,the higher the storage efficiency is,and there is no rule.The calculation based on the official pre partition formula is not the best number of pre partition.(3)Finally,this paper uses database with 100 W data as the query.According to the characteristics of HBase,combined with Phoenix,the second index table of spatiotemporal data is established,and the query of data attributes is realized.The efficiency of index table to data query is compared.The experiment shows that the retrieval efficiency of index table is higher than that without index table.For the query of space-time range,this paper is a precise time-space range query based on the combination of filter and spark technology framework,and the comparison of the spatial-temporal range query of index table and the geomesa system,which proves the feasibility and effectiveness of the key technology research in this paper.The experiment shows that the efficiency of the improved Hilbert curve coding algorithm and the spatial-temporal range query algorithm based on index table is about 16%-26% higher than that of geomesa in this experiment,and 45-240 times higher than that of the algorithm without index table.
Keywords/Search Tags:spatio-temporal big data, Hilbert curve, Hilbert coding, HBase, index
PDF Full Text Request
Related items