Font Size: a A A

Research On Clustering By Fast Search And Find Of Density Peaks

Posted on:2019-05-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y H HouFull Text:PDF
GTID:2428330545469675Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The arrival of Web2.0 era,making the text information on the network showing explosive growth,people in the information required on the Internet to organize that it takesmore and more energy and time,lead to information on how these massive noise from texttimely and accurately search for information useful to the user is required to wait one kind ofproblem.As apopular and useful method in data mining field,the clustering has attracted more andmore researchers all over the world.Clustering by fast search and find of density peaks has been published by Rodriguez and Laio in the Science since 2014.The CFSFDP has been recognized by various field widely,and it is aoutstanding algorithm for finding cluster centers fast and effectively.But there still aremany disadvantages exist in the CFSFDP:(1)The CFSFDP is unable to deal with the datapoints with low density in the data set.The algorithm will distribute the anomalies orhub nodes to the clusters mistakenly.(2)The process of selecting cluster center is determined by human that may decrease the objectivity and accuracy of the clusteringresult.(3)The algorithm can hardly gain a good clustering when it meets the complexstructure data such as flow pattern,different density and various scale data.By studying clustering by fast search and find of density peaks(CFSFDP)and it is proposed to improve thepotential of rapid entropy density peak search algorithm(PEE-CFSFDP)based;and on thisbasis,proposes a fusion of K-means and improved fast density peak search algorithm on UCI data sets verify the improved text clustering algorithm has good stability and accuracy,the details are as follows:First,after referring to a large number of related clustering algorithms,this thesis systematically introduces the relevant knowledge of clustering algorithms,such as various clustering algorithms,how to evaluate the performance of clustering algorithms and so on.Second,the rapid density peak search algorithm(CFSFDP)is density-based clusteringalgorithm for the calculation of the local density of the algorithm for truncated distance triggered manually set for small data set algorithm clustering effect is poor and the sample classification appears one sample dispensing error caused by a series of errors and sample allocation class clusters in the sample overlap other shortcomings.The thesis proposes a concept of entropy potential data fields to automatically define the sample local density measurement optimization function(PEE-CFSFDP),to objectively determine the cut-off distance based on comprehensive index of potential energy and entropy,more reasonable to calculate the local density,clustering effect making more scientific.Third,for the K-means algorithm randomly k points as initial cluster centers iterate cause instability clustering results,this paper presents a blend of PEE-CFSFDP and K-means clustering algorithm of KPEE-CFSFDP.PEE-CFSFDP use to characterize the cluster center initialization and automatically selects the k value,to make up for the k-means algorithm given in advance the number of clusters,the initial cluster centers selected sensitive and fall into local minima problems.On UCI data sets and data sets artificial experiments show that fusion algorithm can get better clustering results,and clustering is very stable.
Keywords/Search Tags:K-means algorithm, Clustering by fast search and find of density peaks, Cluster analysis, Potential energy entropy
PDF Full Text Request
Related items