| Trajectory similarity computation is widely used in travel recommendation,epidemiological detection,criminal detective and other fields as a similar behavior mining technique for mobile object.The large scale,rapid growth and diverse data structure of trajectory big data put forward higher requirements for similarity analysis and similarity pattern mining.How to analyze and mine similar behaviors of mobile objects by comprehensively considering factors such as trajectory scale and complex semantics has become one of the hotspots of research in the field of similarity computation of trajectory big data.Taking trajectory big data as the research object,the thesis carries out in-depth research and exploration on the theory and technology of trajectory multidimensional semantic similarity computation,and addresses the problems of computational efficiency and accuracy of trajectory big data similarity computation,and the main research contents and achievements are as follows:1.Research on trajectory similarity query method based on multidimensional semantic association-aware data partitioning model.Aiming at the problem of low efficiency of trajectory similarity query due to redundant transmission of large-scale static multidimensional semantic trajectories across nodes,this thesis studied the trajectory similarity query method based on the multi-dimensional semantic correlation-aware data partitioning model.This thesis designs a multi-dimensional semantic trajectory similarity metric function(Multi-Dimensional Similarity,MDS)to compute multi-dimensional semantic trajectory similarity by combining textual,temporal,and spatial similarity using a weighted linear combination approach to overcome the limitations of the uni-dimensional similarity threshold constraints.Multidimensional semantic association-aware data partitioning model(Multi-Dimensional Partitioning,MDP)is proposed.Based on the multi-dimensional semantic trajectory similarity metric function MDS,the textual item,start-end time distance,and trajectory segmenttrajectory space distance pruning theorems are derived.The upper and lower bound constraints of the pruning theorem are utilized for grouping text items,slicing the temporal start point,slicing the temporal end point,and partitioning the trajectory segments,and storing textually,temporally,and spatially semantically similar trajectories to the same partitions.This solves the problem of cross-node redundant transmission of large-scale static multidimensional semantic trajectories.This chapter proposed a distributed indexing model based on MDP,designed a local index with R-tree variant structure,where the internal nodes in the tree retain all the trajectory IDs contained in the subtree to efficiently organize the data within the partition;use hash mapping to design a global index with four-layer tree structure to achieve distributed management of the partition.This chapter designed a trajectory similarity query algorithm based on distributed index,used global index to locate partitions containing similar trajectories to generate candidate partitions,and used local index to find similar trajectories in each candidate partition in parallel.Several datasets are used to test the query time of proposed method and other methods,and the experimental results show that proposed method can effectively reduce the trajectory similarity query time.2.Research on incremental partitioning method based on trajectory similar query costaware model.Aiming at the problem of inefficiency of incremental partitioning due to unsuitable modeling of dynamic multidimensional semantic trajectory data partitioning,this thesis studied an incremental partitioning method based on the cost-aware model of trajectory similarity query.It used the R-tree sub-tree selection strategy to store dynamic trajectories to HDFS(Hadoop Distributed File System)blocks,then generated unoptimized intermediate partitions,and solved the problem of sequential write limitation of HDFS.This chapter uses combinatorial optimization theory to formally model the partitioning optimization of incremental partitioning under hardware read/write capacity constraints.The HDFS-oriented trajectory similar query cost-aware model CBM(Cost-based Model)is proposed,which models the total query cost of trajectory similar queries over HDFS using the number of similar query access blocks and data volume size as decision variables.A CBM-based partition reorganization benefit model is devised,which models the total query benefit of trajectory similar queries after partition reorganization using the upper and lower bound theory to estimate the current partition reorganization similar query cost.A greedy algorithm that minimizes the query cost of the current partition is designed,which selects the subset of partitions with the largest reorganization benefit from intermediate partitions for physical reorganization.Experiments are conducted to test the ingestion time of Tinba and other methods using several datasets.The experimental results show that proposed method enables efficient incremental partitioning of dynamic trajectories.3.Research on trajectory similar query cardinality estimation method based on lightweight neural network.Aiming at the problem that the difficulty of efficiently characterizing a small number of multidimensional semantic trajectory samples leads to a large error in the estimation of trajectory similarity query cardinality,this thesis studied the trajectory similar query cardinality estimation method based on lightweight neural network.Query slice-based lightweight neural network model is designed to learn the embedding representation of highdimensional query vectors from low-dimensional query slice vectors to solve the overfitting problem of fully connected neural network learning.A lightweight neural network model based on data slicing is designed to partition the dataset into multiple data segments using the DBSCAN method,and the local model is trained to learn the distribution of distances between the query segments and the data segments to improve the model learning capability.To address the problem of redundant estimation of local models,a global model considering the distance between query vectors and centroids of data segments is designed,and local models that can produce non-zero estimates are selected to improve the efficiency of base estimation.Summation pooling technique is used to merge the embedded codes of multiple queries into one code to reduce the computational cost of cardinality estimation.Multiple datasets are used to test the Q-error of proposed method and other methods.The experimental results show that proposed method is capable of accurate cardinality estimation for similar queries.Through the study of semantic representations of mobility behaviors and similar behavioral patterns,this thesis measures the similarity of multidimensional semantic trajectories using the linear combination method of unidimensional similarity.Research on the architecture and principle of distributed computing system,and this thesis use the upper and lower bound theory to partition the local similarity of static multidimensional semantic trajectories of moving objects.The main factors affecting the incremental partitioning of dynamic trajectories are studied in depth,and the partitioning optimization of incremental partitioning under the constraints of hardware read/write capacity is formally described using combinatorial optimization theory.Through the study of deep neural network model structure and learning mode,the lightweight neural network model is utilized to establish the cardinality estimation model and algorithm that can accurately reflect the distance distribution of multidimensional semantic trajectory data.Through the combination of theoretical and experimental methods,it is useful to explore for the in-depth research of big data and deep learning technology in the field of trajectory big data similarity calculation,which is of great theoretical significance and practical application value to promote the application of trajectory big data in the fields of intelligent traveling,epidemiological detection,criminal detective and so on. |