Font Size: a A A

Research Of New Clustering Algorithms Based On Local Information

Posted on:2019-04-27Degree:MasterType:Thesis
Country:ChinaCandidate:C C RenFull Text:PDF
GTID:2428330572452121Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,the demands for knowledge acquisition and information processing have been further upgraded,then data mining technology has emerged at a historic moment.Clustering analysis,as one of the most significant algorithms in it,mainly used to discern the internal relation between objects and discover the unknown classes in data set on the basis of similarity.The studies of clustering theories and methods have put forward higher requirements for the performance and application scenarios of the related algorithms.Since the existing algorithms often compare a sample with all the samples except itself when calculating the similarity,the calculation amount is too large and the calculation time is difficult to accept,which results in poor performance when dealing with large-scale data sets.Therefore,how to make rational and proper use of local information is the key to improve the performance of clustering algorithms significantly.This paper is based on the in-depth study of a variety of classical clustering algorithms,after analyzing their existing defects,the corresponding improved methods are put forward in combination with knowledge acquired.The concrete work includes the following aspects:(1)To identify the data clustering centers,a center recognition algorithm based on the improved gravity search algorithm is proposed.In order to combine the target of identifying clustering centers with the gravitational search algorithm,a new coding strategy is designed so that a particle can represent a set of clustering centers,the process of particle iteration is the process of continuous optimization of clustering centers;for the purpose of pinpointing the center points,this chapter improves gravity search framework: it redefines the population fitness function firstly;secondly,local search operations are added at the end of each wheel iteration for increasing the diversity and preventing the optimization framework itself from being vulnerable to precocious defects.The algorithm is finally put forward according to the above improved frame.For the sake of verifying whether the new method is effective or not,a comparison experiment was conducted with the two well-known iterative algorithms and three new algorithms proposed in the past few years.It turns out that the proposed algorithm has achieved better consequences in the test sets,and the accuracy of clustering is improved conspicuously.(2)Aiming at the problem of density definition and sample allocation strategy of density peak algorithm,an improved peak clustering algorithm on the basis of data field is proposed.The new method mainly embraces two stages.The first stage is to identify the clustering centers: the neighborhood radius is obtained firstly based on data field and minimum potential entropy,then the potential energy of each data object is calculated using the improved potential formulas as density,and the distance is calculated based on the corresponding formulas,the centers are then determined according to the new decision graph which is made up by density and distance.The second stage is to perform sample allocation strategy: the proposed algorithm divides all samples into three types: center points,core points and suspected outliers,and designs different sample allocation strategies for these three types of samples respectively,which overcomes the ‘joint distribution errors' of the original algorithm.The results from the synthetic datasets and UCI datasets manifest that the proposed algorithm can identify the cluster centers accurately,in addition,the disadvantage of the sample allocation strategy of the original algorithm framework is overcome,and the correctness of clustering has been improved.
Keywords/Search Tags:Gravitational Search Algorithm (GSA), the Fitness Function, Local Search, Density Peak, Data Field, Sample Allocation
PDF Full Text Request
Related items