Research On Clustering By Fast Search And Find Of Density Peaks

Posted on:2019-05-20

Degree:Master

Type:Thesis

Country:China

Candidate:Y H Hou

Full Text:PDF

GTID:2428330545469675

Subject:Computer Science and Technology

Abstract/Summary:

The arrival of Web2.0 era,making the text information on the network showing explosive growth,people in the information required on the Internet to organize that it takesmore and more energy and time,lead to information on how these massive noise from texttimely and accurately search for information useful to the user is required to wait one kind ofproblem.As apopular and useful method in data mining field,the clustering has attracted more andmore researchers all over the world.Clustering by fast search and find of density peaks has been published by Rodriguez and Laio in the Science since 2014.The CFSFDP has been recognized by various field widely,and it is aoutstanding algorithm for finding cluster centers fast and effectively.But there still aremany disadvantages exist in the CFSFDP:(1)The CFSFDP is unable to deal with the datapoints with low density in the data set.The algorithm will distribute the anomalies orhub nodes to the clusters mistakenly.(2)The process of selecting cluster center is determined by human that may decrease the objectivity and accuracy of the clusteringresult.(3)The algorithm can hardly gain a good clustering when it meets the complexstructure data such as flow pattern,different density and various scale data.By studying clustering by fast search and find of density peaks(CFSFDP)and it is proposed to improve thepotential of rapid entropy density peak search algorithm(PEE-CFSFDP)based;and on thisbasis,proposes a fusion of K-means and improved fast density peak search algorithm on UCI data sets verify the improved text clustering algorithm has good stability and accuracy,the details are as follows:First,after referring to a large number of related clustering algorithms,this thesis systematically introduces the relevant knowledge of clustering algorithms,such as various clustering algorithms,how to evaluate the performance of clustering algorithms and so on.Second,the rapid density peak search algorithm(CFSFDP)is density-based clusteringalgorithm for the calculation of the local density of the algorithm for truncated distance triggered manually set for small data set algorithm clustering effect is poor and the sample classification appears one sample dispensing error caused by a series of errors and sample allocation class clusters in the sample overlap other shortcomings.The thesis proposes a concept of entropy potential data fields to automatically define the sample local density measurement optimization function(PEE-CFSFDP),to objectively determine the cut-off distance based on comprehensive index of potential energy and entropy,more reasonable to calculate the local density,clustering effect making more scientific.Third,for the K-means algorithm randomly k points as initial cluster centers iterate cause instability clustering results,this paper presents a blend of PEE-CFSFDP and K-means clustering algorithm of KPEE-CFSFDP.PEE-CFSFDP use to characterize the cluster center initialization and automatically selects the k value,to make up for the k-means algorithm given in advance the number of clusters,the initial cluster centers selected sensitive and fall into local minima problems.On UCI data sets and data sets artificial experiments show that fusion algorithm can get better clustering results,and clustering is very stable.

Keywords/Search Tags:

K-means algorithm, Clustering by fast search and find of density peaks, Cluster analysis, Potential energy entropy

Related items

1	Research On Improvement Of Clustering By Fast Search And Find Of Density Peaks And Differential Privacy Protection
2	Cluster Analysis Application And Research Of Text Mining
3	The Research And Application Of Text Clustering Based On Improved K-means Algorithm
4	Study On Semi-supervised Constrained Clustering By Fast Search And Find Of Density Peaks And Its Application On Air-condition Control System
5	Research And Application Of Clustering By Fast Search And Find Of Density Peaks
6	Research On Path-based Of Clustering By Fast Search And Find Of Density Peak
7	Theory And Practice Of Hybrid Clustering Algorithm Based On Density And Ant Colony
8	Research And Application On Clustering By Fast Search And Find Of Density Peaks
9	Clustering Algorithm Of Data Stream Base On Fast Search And Find Of Density Peaks
10	Text Clustering And Its Application Based On CFSFDP Algorithm