Font Size: a A A

Parallel Query Processing On Trajectory Data

Posted on:2019-02-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:Full Text:PDF
GTID:1368330566497846Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Over recent years,there has been an unprecedented growth of big data from various sources such as: archives,docs,media,sensor data,social media,business applications,public web,data storage,machine log data,...etc.The big data has become too complex and too dynamic to be able to process,store,analyze and manage with traditional data devices.Hence,the computation and the analysis of such data are widely studied recently.The Global Positioning System GPS tools and sensor technologies used by the geographic information system GIS collect the massive positional data representing the history of moving objects,which are called as trajectories.A trajectory is an ordered sequence of locations,where each location is named as point of interest having an associated description such as: latitude,longitude,name,and other text description like activities.The Tra Jectory Data Bases(TJDBs)are used to archive the trajectories including their text descriptions and the paths on which their points are connected.However,it is hard to process such big data in order to discover the knowledge and support the decisionmaking.Hence,the extracted trajectory' challenge and query processing in not easy and not insignificant.Furthermore,with the massive movement objects in TJDBs,the query processing needs an efficient traversing method from TJDBs in order to extract the knowledge and discover the trajectory matching.Thus,it requires more computation and exceed the power of centralized method used previously.Therefore,this issue become an important point in academic research as well as in industry.Towards this pursuit,as none of the existing studies could be used to handle some special query problems that related to user'life,this dissertation addresses such challenges in trajectory domain,proposes effective methodologies,balanced indexes for trajectory data management,and high efficiency of query algorithms.It conducts exhaustive evaluations through comprehensive empirical analysis through new quantitative approaches using a distributed system to accelerate the computation and lays foundations for future research directions.Comparing the proposed works of this dissertation by the previous methods,the experimental evaluations ensure the efficiency and the performance of the index and the algorithms proposed.This dissertation includes five chapters: Chapter 1 introduces the main researches and objectives of our study regarding the general discussion of the recent related works.It focuses to present the general definitions and discussions about similarity measure methods,spatial query processing types and trajectory data processing over large trajectory databases(TJDBs).In addition,we take into consideration the application of these mechanisms in relation to trajectory indexing and query problems.Further,this chapter concludes with general arguments proving the position of our work.Chapter 2 presents the parallel trajectory data management and processing over Distributed R-tree index(DTR-Tree).In fact,in this chapter focuses to manage the big trajectory data into a set of indexes using a distributed platform,where each index is located in the separated machine in the cluster.The index adopts the data storage as well as data maintenance.Further,based on the proposed index,an efficient algorithm is developed in order to process the trajectory top-K query.The top-K query is based on distance threshold and a set of keyword information aiming to find the short trajectory included the required activity keywords in sensitive order.In order to optimize the query algorithm,an efficient pruning method is proposed to traverse indexes.Chapter 3 focuses to process the frequent trajectory query.In this study,the indexes that are located in different machines in the cluster store both of geo-location objects and their frequent activity texts.Therefore,this chapter aims to process the distributed datamining algorithm(Apriori algorithm)for each trajectory object including activities.To select the frequent local activities with their supports,the data-mining algorithm is applied on each object or point of interest POI stored in the leaf nodes of each R-Tree partition.Further,the strong association rules are constructed and their confidences are computed.These results are stored in the massive Mining Inverted List(MIL),which is optimized using a traceability method in order to reduce its massive number.Finally,in order to process the proposed query,an efficient parallel query-processing algorithm composed on two steps is developed.The first step of the algorithm aims to simultaneously prune the search space efficiently by traversing the corresponding separated index.The second one aims to simultaneously choose the best trajectory using the optimized MIL list.Chapter 4 presents the trajectory skyline query based on activities and distance search.It aims to process this query on both dimensions using efficient functions.The first function evaluates the multi-frequent activity retrieval using the similarity measures between the activity query and activities that are included in the trajectory objects.The second function evaluates the geo-location retrieval measured between the location of the trajectory activity object and the query.In order to process the trajectory query,based on the proposed functions and the distributed index DMTR-Tree with the inverted files that are developed in the previous chapter,we propose a parallel algorithm to efficiently solve the problem.Furthermore,in order to handle the incomplete activity data problem,which is caused during the dataset storage.In this chapter,we propose to modify the both functions to solve the problem.Further,an efficient parallel trajectory algorithm is developed to adopt the incomplete activity problem as well as the skyline query.
Keywords/Search Tags:Trajectory, GPS Data, Activity, Distributed Computing, Top k Query Processing, Skyline Query
PDF Full Text Request
Related items