Font Size: a A A

Design And Implementation Of Trajectory Query Method Based On Distributed Index

Posted on:2024-09-09Degree:MasterType:Thesis
Country:ChinaCandidate:L M WangFull Text:PDF
GTID:2530307121483434Subject:Electronic information
Abstract/Summary:PDF Full Text Request
With the rapid development of urban economy,more and more buses,taxis,and ride hailing services serve millions of urban residents every day.Hundreds of thousands of vehicles are generating massive amounts of GPS location information every moment,and how to store and manage these massive GPS trajectory data is the primary problem faced by practitioners.Traditional database and software technologies can no longer meet the storage and indexing needs of large-scale GPS trajectories in first tier cities.Relevant researchers have attempted to use big data technology to solve the storage and indexing problems of massive GPS trajectory data.The GPS trajectory is composed of a series of points containing time and spatial location information,which not only includes the travel information of individual vehicles,but also includes the travel characteristics and spatiotemporal characteristics of millions of urban residents.How to choose a suitable index structure to accelerate GPS trajectory query is a challenging problem.Classical trajectory indexing methods,such as Suffix Tree,3D R-Tree,and CSETree indexing,are all implemented in a single node manner and can handle a limited number of GPS trajectories.This article utilizes the RDD component of Spark’s big data computing engine to attempt a distributed implementation of these classic trajectory indexes.And conduct a detailed performance evaluation of these indexes to select the optimal indexing and partitioning method.The specific process includes: first,preprocessing the data.Optimize the original GPS records through data cleaning,trajectory segmentation,and map matching.Secondly,global partitioning for GPS trajectories is constructed based on temporal and spatial information.Finally,a trajectory index is constructed locally to form a trajectory storage structure for both global and local indexes.We evaluated this method using trajectory point queries,region queries,substring queries,and spatial and spatiotemporal grid partitioning based on a 13 node Spark cluster and a monthly total of 114 GB of GPS data from 30000 taxis.The experimental results show that in terms of latency,when using spatial grid partitioning and spatial grid partitioning methods,the query latency of trajectory points and substrings is Suffix Tree>3D R-Tree,and the query latency of regions is Suffix Tree>3D R-Tree>CSE-Tree;In terms of parameters,query latency is influenced by environmental parameters such as the number of cores and memory,as well as query conditions;In terms of queries,both Suffix Tree and 3D R-Tree can handle trajectory point queries,region queries,and substring queries,but CSE-Tree can only handle region queries.When the 3D R-Tree dataset uses a 7-day dataset,under the time grid partitioning method,the average query latency for trajectory point query,substring query,and region query in the default environment and request conditions is 2.33 seconds,3.33 seconds,and 6.88 seconds,respectively.
Keywords/Search Tags:trajectory data query, Spark, Suffix tree, 3D-Rtree, CSE-Tree, Partition, Indexes
PDF Full Text Request
Related items