With the advancement of location-based service technology,large-scale trajectory data can be collected.The trajectory data contains the movement trends and states of various mobile objects in the historical dimension,which provides the corresponding information for understanding the behavior characteristics of mobile individuals and the changes of urban traffic flow.For trajectory data in urban traffic system,sequential pattern mining is an important tool to discover the inner pattern and extract valuable information from it.Traditional aprioribased mining models are easy to understand and implement,but require high computational resources;prefixspan-based partitioning models converge quickly,but are not friendly to support long sequences and data with many kinds of sequences;deep learning-based mining models are strong in learning data features,but the algorithm training time is long and the interpretability of patterns is poor.Based on the cab track data,a spatio-temporal track sequence pattern mining algorithm based on the column structure-assisted indexing model is proposed.On this basis,in order to solve the problem of large-scale data mining,the trajectory data are reasonably partitioned and the distributed computing engine is used to parallelize the pattern mining.Finally,based on the spatio-temporal trajectory sequence patterns mined,a collaborative route recommendation algorithm covering the experiences and preferences of cab drivers in urban areas is proposed to provide users with optimized route recommendation services.The specific research work is as follows:(1)A trajectory encoding model is proposed.In order to maximize the retention of trajectory data information and facilitate the subsequent research,a trajectory coding model is introduced to clean and feature analyze the original GPS trajectory data,convert the latitude and longitude coordinate point data into trajectory data expressed as road segment sequences,and at the same time,encode and embed the speed,time and license plate number record information of the original trajectory.(2)A column structure auxiliary index model is proposed and a spatio-temporal trajectory sequence pattern mining algorithm SPDTI is implemented based on it;the algorithm records the transaction information and road segment temporal relationship in the trajectory data column structure by double auxiliary index,which improves the compactness of the storage structure;based on the prefix projection model,the sequence patterns are generated by the conditional concatenation of double auxiliary index,which avoids the repeated scanning of the data set The model is based on the prefix projection model,which can generate sequential patterns through conditional concatenation with double auxiliary indexes,avoiding repeated scanning of the dataset.It is proved through experiments that the SPDTI algorithm has better time and space performance compared with the traditional sequential pattern algorithm.(3)To support sequence pattern mining of large-scale trajectory data,the D-SPDTI algorithm is proposed by improving the SPDTI algorithm in parallel based on the Spark Dataset distributed data model.For the data skewing problem,a parallel set load balancing data partitioning strategy is proposed to improve the execution efficiency of the proposed method in a distributed environment.(4)Based on the spatio-temporal trajectory sequence model,a collaborative path recommendation method covering the experiences and preferences of cab drivers in urban areas is proposed.The method consists of a collaborative empirical path discovery model(CEPD)and an experience-driven network model(EDN).the CEPD phase performs area-cluster-toarea-cluster road segment retrieval to capture top-n trajectory sequence patterns with high collaborative experience rank and better distance.the EDN phase iteratively generates an empirical path network for a given O-D condition to support path recommendation.In the experiments,sensitivity analysis is used to select the optimal parameters.The experiments prove that the recommended path is more reliable than the shortest path and the fastest path,and has more advantages in terms of travel distance,travel time and average speed.The column structure and indexing idea in SPDTI method provides new ideas for trajectory sequence pattern mining,and its distributed model can be applied to the pattern mining work of large-scale trajectory data.The collaborative path recommendation model based on driver’s experience has reference value in the balance calculation of path hotness and reliability,and the model is important to the field of path planning. |