Font Size: a A A

Research On Distributed Trajectory Data Index And Query Technology

Posted on:2022-08-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z XieFull Text:PDF
GTID:2518306605468014Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the continuous development of positioning technology and the advancement of informatization on various industries,a large amount of trajectory data has been generated.Trajectory data is reflected on all aspects of our lives.Nowadays,people carry mobile devices,such as mobile phones,smart wearables,etc.,and human movement will generate trajectory data.With the popularity of GPS equipment,a large number of vehicles will be equipped with positioning devices,and the movement of vehicles will also generate trajectory data.Some zoologists will analyze the migration trajectories of animals,and meteorologists will study the movement trajectories of natural phenomena(typhoons,etc.).The above content illustrates the explosive growth trend of trajectory data.How to store and analyze these huge trajectory data has become a serious problem we face.Traditional relational databases have natural limitations when dealing with huge data,and have disadvantages in horizontal expansion.The emergence of distributed platforms provides powerful technical support for processing massive amounts of trajectory data.Distributed systems can effectively utilize the computing and data storage capabilities of each node on the cluster to process and store large-scale data more efficiently.With years of development,the Hadoop ecosystem already has many components,such as HBase,Hive,Pig,and Zookeeper.This thesis will combine the characteristics of the trajectory data and use the HBase non-relational distributed database as a storage platform to reliably store and quickly query the trajectory data.According to the characteristics of HBase,this thesis designs a DBT-Hash index scheme,and designs a trajectory data query system based on the structure of the DBT-Hash index.The system supports the import of trajectory data(import of real-time data,batch import of historical data).Based on the query conditions,users can perform trajectory query,range query,space-time circle query,KNN query,and most approximate trajectory query on the trajectory data.The work of this thesis is as follows:(1)Designed and implemented a grid indexing scheme based on the trajectory data of the HBase platform.The grid is divided into space and time according to the three dimensions of time,longitude and latitude,and the trajectory id that intersects the grid is recorded in the grid.When querying We can first calculate the grid where the trajectory is located,and then perform fine filtering through the grid to obtain reliable query results.(2)According to the characteristics of HBase's LSM tree storage and primary key lexicographical order,the existing space filling curve is studied.This thesis designs the time prefix + Geohash code as the index code of the grid,and the data is stored redundantly in the time dimension,Used for efficient query,that is,use space in exchange for time.(3)In order to achieve a finer division of time query granularity,this thesis implements an efficient time segment segmentation algorithm that can accurately segment the smallest time interval to reduce the number of scans of HBase.And based on the Geohash coding design a common prefix matching algorithm,combined with the DBT-Hash index scheme,used to accelerate the speed of multiple query methods.(4)Since the performance of the server is generally better than that of the client,in order to increase the query speed,this thesis uses HBase's custom filter on the server to filter,which can also reduce the transmission of redundant data on the network.Finally,using Geolife1.3 collected by Microsoft Research Asia as the experimental data set,two data entry methods were designed,real-time data import and historical data batch import.The same kind of scheme is selected as a control experiment,and the best grid division level is determined first,and then the experimental analysis is carried out.The experimental results show that DBT-Hash is superior to other schemes in trajectory query,range query,space-time circle query,and KNN query.
Keywords/Search Tags:trajectory data, HBase, grid index, spatiotemporal query, DBT-Hash
PDF Full Text Request
Related items