Font Size: a A A

Research On Parallel Density-based Clustering Algorithm In Big Data

Posted on:2022-08-18Degree:MasterType:Thesis
Country:ChinaCandidate:K B XuFull Text:PDF
GTID:2518306524998599Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As an unsupervised data mining algorithm,density clustering algorithm enables people to rely on the relevant features in the data to mine the potential distribution relationship between things.However,with the advent of the era of big data,all walks of life are facing the pressure of dealing with massive data.How to quickly and effectively mine useful value and information from massive data has become a research hotspot of density clustering mining algorithm under big data.With the wide application of MapReduce and other distributed frameworks,the parallel density algorithm based on distributed computing architecture has gradually become the main direction of research,especially the research on the timeliness and accuracy of parallel density clustering algorithm,which not only provides the basic guarantee for rapid processing and real-time mining of useful value,but also has important significance for various industries to make accurate decisions.At present,although the parallel density clustering algorithm under big data has achieved certain results,due to the lack of rationality and low computational efficiency of data partition and merging in the parallel process,and the limitations of the current density clustering algorithm,the existing density clustering algorithm under big data is generally not high in parallel efficiency and accuracy of clustering results.To solve these problems,this paper mainly from three aspects to improve the performance of parallel density clustering mining algorithm.First,according to the spatial distribution of data points,design adaptive data partition strategy to improve the rationality of data partition.The second is to design and optimize the MapReduce parallel strategy in different stages by using the fast index storage structure.Third,the weighted grid is used to enhance the relevance of data partition,and the advantage of swarm intelligence optimization algorithm is used to solve the sensitive problem of parameter selection of clustering algorithm,so as to improve the clustering effect of the algorithm.Aiming at the problems of unreasonable division of data gridding,poor parameter optimization ability and low efficiency of parallelization in big data density-based clustering algorithm,this paper proposes a density-based clustering algorithm by using improve fruit fly optimization based on MapReduce,named MR-DBIFOA.Firstly,based on KD-Tree,a division strategy(KDG)is proposed to divide the cell of grid adaptively.Secondly,an improve fruit fly optimization algorithm(IFOA)which use the step strategy based on knowledge learn(KLSS)and the clustering criterion function(CFF)is designed.In addition,based on IFOA algorithm,the optimal parameters of local clustering are dynamically selected,which can improve the clustering effect of local clustering.Meanwhile,in order to improve the parallel efficiency,the density-based clustering algorithm using IFOA(MR-QRMEC)are proposed to parallel compute the local clusters of clustering algorithm.Finally,based on QR-Tree and MapReduce,a clusters merging algorithm(MR-QRMEC)is proposed to get the result of clustering algorithm more quickly,which improve the core clusters merging efficiency of density-based clustering algorithm.The experimental results show that the MR-DBIFOA algorithm has better clustering results and performs better parallelization in big data.
Keywords/Search Tags:big data, density-based clustering algorithm, parallelization, weight grid, swarm intelligent optimization algorithm
PDF Full Text Request
Related items