Font Size: a A A

Research And Implementation Of Distributed In-situ Trajectory Clustering Algorithm

Posted on:2021-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y F ChenFull Text:PDF
GTID:2428330623468557Subject:Engineering
Abstract/Summary:PDF Full Text Request
Cluster analysis is one of the commonly used algorithms in the field of data mining.The clustering algorithm can divide the data set into several subsets,so that the elements within the subsets have some degree of similarity with each other,and between different subsets the similarity of elements is poor;the input of this algorithm is the entire data set,and with the advent of the era of big data,centralized data storage methods have exposed more and more problems,and distributed data storage methods are widely used;In the era of the mobile Internet,user trajectory data has begun to accumulate rapidly.How to perform distributed clustering calculation of trajectory data in a distributed environment has become an urgent problem.The distributed trajectory clustering algorithm proposed in this thesis conducts research on three aspects: network bandwidth consumption,data privacy,and clustering accuracy.The main contents are as follows:(1)A distributed trajectory clustering algorithm based on composite sampling-CSD-Clustering algorithm is proposed.Firstly,the problem of clustering accuracy of distributed trajectory clustering algorithms based on the combination of local clustering and global clustering is analyzed.To solve this problem,we use a combination of polynomial fitting and optimization theory to fit the trajectory model.This trajectory model fitting method can well ensure the accuracy of distributed clustering and also protect the privacy of the trajectory data to a certain extent.In addition,the algorithm uses a composite sampling scheme,which can effectively reduce the consumption of network transmission.Finally,the effectiveness and feasibility of the algorithm are verified by simulation experiments.The experimental results show that the CSD-Clustering algorithm can accurately complete the distributed trajectory clustering calculation task,and it also performs better in terms of privacy and bandwidth consumption.(2)A distributed trajectory clustering algorithm based on Markov chains,the MCD-Clustering algorithm,is proposed.Firstly,the problems of distributed clustering algorithm with the main idea of describing the distribution of trajectory data when processing high-dimensional data are analyzed.Aiming at this problem,the algorithm uses the correlation between various dimensions in the high-dimensional trajectory data,and proposes a method for describing the distribution of trajectory subclusters based on Markov chain models.This method mainly transfers the transfer matrix corresponding to the Markov chain model in the network,and uses a sparse matrix storage to represent the transfer matrix to relieve the pressure on the network bandwidth.This solution solves the problem that current distributed clustering algorithms cannot accurately describe the distribution characteristics of high-dimensional trajectory data,and improves the shortcomings of the CSD-Clustering algorithm in terms of network bandwidth consumption and privacy protection.However,it is slightly inferior to the CSD-Clustering algorithm in clustering accuracy.(3)Two distributed trajectory clustering algorithms proposed in this thesis are implemented on a multi-center big data analysis system based on in-situ calculation.First,the overall design of the system architecture is described,and the module training module,network communication module,comprehensive calculation module and cluster evaluation module involved in the distributed clustering algorithm are designed in detail,and then the system screenshot is displayed through a visual interface,Showing the system operation process and operation.
Keywords/Search Tags:Distributed, clustering, data privacy, trajectory data, network bandwidth consumption
PDF Full Text Request
Related items