Font Size: a A A

Research On Density-based Hierarchical Clustering Algorithm

Posted on:2016-12-27Degree:MasterType:Thesis
Country:ChinaCandidate:W K ZhangFull Text:PDF
GTID:2308330470957743Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Clustering is known as the unsupervised classification in pattern recognition, or nonparametric density estimation in statistics. The aim is to partition given data set of points or objects into natural grouping(s) according to their similarity to improve understanding on the condition of no priori-knowledge, or be as a method to compress data. Cluster analysis has been widely used in a lot of fields, like computer version, bioinformatics, image progressing, Knowledge Discovery in Databases, and many other areas. Thousands of clustering algorithms have been proposed, challenges still remain:differing shapes, high dimensions, how to determine the clusters number, how to define a right clustering, hard to evaluate.Density-based clustering algorithms which classify points by identifying regions heavily populated with data, have performed well while handling problems of arbitrary shapes of subclasses. Recently, an density-based clustering algorithm, CFSFDP (clustering by fast search and find of density peaks) was proposed by Alex and Anlessandro to detect non-spherical groups, which does not need to pre-specify the number of clusters of variant shapes either. In addition, CFSFDP needs few parameters. Compared to other iterative clustering algorithms, CFSFDP is computationally cheaper. By the experiments of identifying the number of subjects in the Olivetti Face Databas, the team have shown CFSFDP’s capacity to solve high dimensional data.However, in our opinion, there are some drawbacks of the beautiful CFSFDP, which will limit the application of CFSFDP. Firstly, just as DBSCAN, thin clusters would not be captured by the decision graph. Besides, a rigid hidden requirement for getting right clusters is that, each cluster in the data sets must have a density peak and only one peak is promised, otherwise CFSFDP will split natural groups. In this paper, inspired by hierarchical clustering, we present a novel hierarchical clustering algorithm based on CFSFDP. In particular, we take CFSFDP as a tool to generate initial clusters. Then we merge the initial clusters pair by pair to get finial clusters with an improved clusters distance model. Our approach can find thin clusters. What’s more, it eliminates the strict claim of density peaks. To display our efforts, we benchmark our algorithm on the data sets draw from other methods, of which there is no unique density peak for each cluster. Our technique gets partitions of these data sets as well as that generated by the methods proposed in the papers where the data set was designed. And it’s easier to deterimine the parameters.
Keywords/Search Tags:clustering, density peaks, decision graph, k-nearest neighbor graph, hierarchical clustering, similarity, closeness, density, distance form points of higherdensity
PDF Full Text Request
Related items