Font Size: a A A

Research And Improvement On Density-Based Clustering Algorithm

Posted on:2018-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:P WangFull Text:PDF
GTID:2348330515452356Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Cluster analysis,is a kind of technology that classifies the clustering data according to the similarity between the data without any prior knowledge.In the pattern recognition is called unsupervised classification,which is called nonparametric estimation in statistics.Cluster analysis is widely used in many fields,such as bioinformatics,information security,text clustering and so on.In the past few decades,thousands of clustering algorithms have been proposed by different scholars,but there is still have great research space:Different shapes and densities of the clusters,the rational processing of high-dimensional data,how to determine the number of clusters in the cluster,the reasonable detection of noise points,and how to define and judge a correct cluster.Alex Rodriguez and Alessandro Laioa was proposed a new heuristic clustering algorithm CFSFDP(Clustering by fast search and find of density peaks)in 2014.The algorithm has the characteristics of few parameters,fast execution speed,effective detection of the number of target clusters and processing noises.The author validates the effectiveness of the algorithm by a series of experiments and uses the image clustering in the Olivetti face database to prove that the algorithm can handle the high dimensional data.However,through the study,we found that the algorithm does not perform well when it encountered in some cases,First,the selection of the initial cluster center of the algorithm depends on manual selection and can not be effectively extracted for cluster centers in densely sparse areas,Secondly,the algorithm determines that each cluster in the data set has only one local density extreme point,this leads to the misclassification of clusters with multi-density extreme points and clusters shared local density extreme point.Furthermore,the algorithm's method of identifying noise points causes more data to be judged as noise.Based on these findings,we propose a new algorithm based on find density peaks,through discovery the inflection point of decision graph to automatically identify the clusters center.Secondly,using the local density distribution of sub clusters and the improved hierarchical clustering algorithm to split and merge clusters.Finally,the noise of the data is identify by the outlier feature.The experimental results show that the improved algorithm has better clustering effect than the original algorithm.
Keywords/Search Tags:clustering, density peaks, decision graph, hierarchical clustering, outlier feature, inflection point, local density, distribution
PDF Full Text Request
Related items