Font Size: a A A

Research On Index And Query Technology Of HBase For Spatiotemporal Data

Posted on:2021-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:Z ZouFull Text:PDF
GTID:2518306107986349Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Spatiotemporal data is a kind of big data with time,space and other attributes,which plays an active role in production and life.It is of great significance and value to study the efficient storage and query of massive spatiotemporal data.Distributed storage systems can access big data better than traditional relational databases.As a typical technology,HBase has been widely researched and applied for its open-source,high reliability and high expansion.Currently,HBase only optimizes index for rowkey without creating spatiotemporal index and secondary index,which cannot directly meet the efficient spatiotemporal query and condition query requirements of spatiotemporal data.Based on the characteristics of the existing index and query technology of HBase,this thesis proposes an index and query optimization scheme of HBase for spatiotemporal data,which is divided into three parts: storage model,index model and query method.The storage model realizes the reasonable storage of spatiotemporal data.The index model realizes the design and management of spatiotemporal index and secondary index.The query method realizes the fast spatiotemporal query and conditional query.The main research contents are as follows:(1)The storage model of HBase for spatiotemporal data.In order to achieve reasonable storage of spatiotemporal data,geohash code is used to divide the spatial scale into grids,and a grid partition method based on historical statistics is proposed to solve the hot data issues.(2)The hierarchical spatiotemporal index model(HSTIndex).In order to improve the performance of spatiotemporal query,a global index layer based on the Meta table and a local index layer based on Region are designed,which filter the data through spatiotemporal information.(3)The classification secondary index model(CSIndex).In order to improve the performance of conditional query,different memory index structures including bitmap,hash and BD-tree are designed according to the data characteristics and query requirements of other attributes.In addition,an index management mechanism based on Observer is proposed for the effective management of HSTIndex and CSIndex.(4)The query method of HBase for spatiotemporal data.Combined with the storage model and the index model,a parallel query mechanism based on Endpoint is proposed.On this basis,the optimization algorithm of spatiotemporal range query and k-nearest neighbor query based on HSTIndex,and the optimization algorithm conditional query based on CSIndex are designed.Finally,comparative experiments are carried out on the real-world taxi trajectory dataset.The results show that the performance of grid partition method is better than the traditional method,the performance of spatiotemporal query based on HSTIndex is also better than STEHIX(Spatio-Temporal HBase Index),and the performance of conditional query based on CSIndex is significantly improved compared with the Solr-based scheme and Hi Base(Hierarchical-indexed HBase).At the same time,time and space overhead is acceptable.In general,the optimization scheme in this thesis improves the overall performance of spatiotemporal data and has certain application value.
Keywords/Search Tags:Spatiotemporal Data, HBase, Spatiotemporal Index, Secondary Index
PDF Full Text Request
Related items