Font Size: a A A

Research On Clustering Algorithms Of Location Big Data Based On MapReduce

Posted on:2020-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y S HuFull Text:PDF
GTID:2428330599476497Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the popularization of positioning equipment and the improvement of the accuracy of positioning satellites,the applications based on location have increased greatly,so the complexity and quantity of location data increase significantly.It is very valuable for academia and application to research the acquisition,processing,analysis,storage and visualization of location data.The clustering analysis of large location data can extract the rules of location data from a large number of data,so as to obtain valuable information.The traditional serial clustering algorithms are usually difficult to process big data efficiently,so the research of parallel clustering algorithms are gradually becoming a hot topic.This paper studies and optimizes the parallel clustering algorithm based on MapReduce.The parallel processing framework improves the efficiency of clustering the large location data and ensures the quality of clustering algorithm.The main work and achievements of this paper are as follows.1.This paper proposes a cell clustering algorithm based on MapReduce and strongly connected fusion for improving the efficiency of processing large location data.Firstly,we obtain clustering results of data subsets according to the improved DBSCAN algorithm based on MapReduce.Next,we analyze the relationship between grid and cluster and define the concept of Grid-cluster,connectivity of Grid-clusters.Then we calculate the connectivity weights matrix between Grid-cluster and Grid-cluster.Finally,we decide whether to reduce two Grid-clusters or not according to connectivity weight.The experimental results show that the proposed algorithm has high efficiency and high clustering quality in processing large location data.2.In order to solve the problem that density clustering algorithms is sensitive to parameter,this paper proposes a optimal 2?-neighborhood clustering algorithm.Firstly,we define the concept of 2?-neighborhood based on the definition of ?-neighborhood in traditional density clustering algorithm.Based on that,we propose an optimization algorithm of optimizing the value of ? to obtain the most suitable value of ?.The experimental results show that the value of ? obtained by this algorithm is more reasonable and the algorithm is an adaptive process.Then the optimal2?-neighborhood clustering algorithm is implemented under the framework of MapReduce,the experimental results show that the optimal 2?-neighborhood clustering algorithm based on MapReduce has high clustering quality when dealing with large location data.Generally,the paper studies the working mechanism of MapReduce and clustering algorithm.The most important content is researching the optimization and parallelization of clustering algorithm based on MapReduce.The aim of the research is achieving higher processing efficiency and quality when analyzing the location big data.The further studying is optimizing the methods of decomposing and merging parallel tasks.Such as how to block data to make clustering more efficient,how to aggregate the results of data subsets to ensure high efficiency and better clustering quality.
Keywords/Search Tags:data minging, big data of position, MapReduce, DBSCAN, grid
PDF Full Text Request
Related items