Research And Improvement On Density-Based Clustering Algorithm

Posted on:2018-05-29

Degree:Master

Type:Thesis

Country:China

Candidate:P Wang

Full Text:PDF

GTID:2348330515452356

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Cluster analysis,is a kind of technology that classifies the clustering data according to the similarity between the data without any prior knowledge.In the pattern recognition is called unsupervised classification,which is called nonparametric estimation in statistics.Cluster analysis is widely used in many fields,such as bioinformatics,information security,text clustering and so on.In the past few decades,thousands of clustering algorithms have been proposed by different scholars,but there is still have great research space:Different shapes and densities of the clusters,the rational processing of high-dimensional data,how to determine the number of clusters in the cluster,the reasonable detection of noise points,and how to define and judge a correct cluster.Alex Rodriguez and Alessandro Laioa was proposed a new heuristic clustering algorithm CFSFDP(Clustering by fast search and find of density peaks)in 2014.The algorithm has the characteristics of few parameters,fast execution speed,effective detection of the number of target clusters and processing noises.The author validates the effectiveness of the algorithm by a series of experiments and uses the image clustering in the Olivetti face database to prove that the algorithm can handle the high dimensional data.However,through the study,we found that the algorithm does not perform well when it encountered in some cases,First,the selection of the initial cluster center of the algorithm depends on manual selection and can not be effectively extracted for cluster centers in densely sparse areas,Secondly,the algorithm determines that each cluster in the data set has only one local density extreme point,this leads to the misclassification of clusters with multi-density extreme points and clusters shared local density extreme point.Furthermore,the algorithm's method of identifying noise points causes more data to be judged as noise.Based on these findings,we propose a new algorithm based on find density peaks,through discovery the inflection point of decision graph to automatically identify the clusters center.Secondly,using the local density distribution of sub clusters and the improved hierarchical clustering algorithm to split and merge clusters.Finally,the noise of the data is identify by the outlier feature.The experimental results show that the improved algorithm has better clustering effect than the original algorithm.

Keywords/Search Tags:

clustering, density peaks, decision graph, hierarchical clustering, outlier feature, inflection point, local density, distribution

PDF Full Text Request

Related items

1	Research On Density-based Hierarchical Clustering Algorithm
2	The Research And Application Of Density Peaks Clustering
3	A Improved Density Peaks Clustering Algorithm
4	Research On Hierarchical Clustering Algorithm Based On Density Peaks
5	An Adaptive Multi-granular Clustering Model Based On Density Peaks
6	Research And Application Of Density Peak Clustering Algorithm Based On Natural Neighbors And Representative Points
7	Research On Density Peaks Clustering
8	The Research On Arbitrary Shape Cluster Algorithm Based On Hierarchy And Density
9	Research And Improvement Of Multi-density-based Clustering Algorithms
10	The Research Of Optimized Density Peaks Clustering And Its Distributed Algorithms