Font Size: a A A

K-means Algorithm For Optimizing Initial Clustering Centers Based On Improved Density Peak

Posted on:2019-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:A Z BaiFull Text:PDF
GTID:2428330545960160Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Data mining technology can excavate useful knowledge from a large amount of data and make corresponding decisions.Clustering analysis is a basic tool in data mining,and it is widely used.The K-means algorithm is a typical partition-based clustering analysis technique,it is widely used due to its simple implementation,fast convergence,and good processing effect on large-scale data.However,there are some problems in the algorithm,such as the number of clusters K needs to be determined artificially in advance and the random selection of initial clustering centers which may lead to the instability of clustering results and so on.The density peak algorithm(DPC algorithm)is a new clustering algorithm proposed in2014.It can find the density peak points(cluster centers)of the data set of arbitrary shape quickly,and distribute sample points and remove the outliers efficiently,what's more,the parameters can easy to be determined.The algorithm is suitable for the large-scale data clustering analysis.In addition,the DPC algorithm is targeted in solving the initial cluster centers selection.In order to solve the problem that the K-means algorithm needs to determine the number of clusters K artificially in advance and select the initial clustering centers randomly which may lead to the instability of the clustering results,a new K-means algorithm based on the improved density peak algorithm is proposed.Depending on improved DPC algorithm,the initial clustering centers and the number of clusters K are determined,which makes up the above defect of K-means algorithm.In order to solve the problem of difficult selection of microarray genes,we combine the improved K-means algorithm with particle swarm optimization algorithm(PSO algorithm),then introduce a new gene selection method based on improved K-means fusion particle swarm optimization(IK-PSO algorithm),which reduces the difficulty of optimization of PSO algorithm.In this paper,to verify the validity and feasibility of the proposed algorithm,we test the data sets selected in UCI database,and the experimental results show that:(1)The K-means algorithm based on the improved density peak algorithm can obtain better initial clustering centers and more stable clustering results,moreover,the convergence speed is faster thanbefore,which proves the effectiveness of the algorithm;(2)The IK-PSO algorithm reduces the difficulty of optimizing of PSO algorithm and improves the classification performance significantly,which proves the feasibility and effectiveness of the algorithm.
Keywords/Search Tags:Clustering analysis, DPC algorithm, K-means algorithm, PSO algorithm, IK-PSO algorithm
PDF Full Text Request
Related items