Research On Density-based Hierarchical Clustering Algorithm

Posted on:2016-12-27

Degree:Master

Type:Thesis

Country:China

Candidate:W K Zhang

Full Text:PDF

GTID:2308330470957743

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Clustering is known as the unsupervised classification in pattern recognition, or nonparametric density estimation in statistics. The aim is to partition given data set of points or objects into natural grouping(s) according to their similarity to improve understanding on the condition of no priori-knowledge, or be as a method to compress data. Cluster analysis has been widely used in a lot of fields, like computer version, bioinformatics, image progressing, Knowledge Discovery in Databases, and many other areas. Thousands of clustering algorithms have been proposed, challenges still remain:differing shapes, high dimensions, how to determine the clusters number, how to define a right clustering, hard to evaluate.Density-based clustering algorithms which classify points by identifying regions heavily populated with data, have performed well while handling problems of arbitrary shapes of subclasses. Recently, an density-based clustering algorithm, CFSFDP (clustering by fast search and find of density peaks) was proposed by Alex and Anlessandro to detect non-spherical groups, which does not need to pre-specify the number of clusters of variant shapes either. In addition, CFSFDP needs few parameters. Compared to other iterative clustering algorithms, CFSFDP is computationally cheaper. By the experiments of identifying the number of subjects in the Olivetti Face Databas, the team have shown CFSFDP’s capacity to solve high dimensional data.However, in our opinion, there are some drawbacks of the beautiful CFSFDP, which will limit the application of CFSFDP. Firstly, just as DBSCAN, thin clusters would not be captured by the decision graph. Besides, a rigid hidden requirement for getting right clusters is that, each cluster in the data sets must have a density peak and only one peak is promised, otherwise CFSFDP will split natural groups. In this paper, inspired by hierarchical clustering, we present a novel hierarchical clustering algorithm based on CFSFDP. In particular, we take CFSFDP as a tool to generate initial clusters. Then we merge the initial clusters pair by pair to get finial clusters with an improved clusters distance model. Our approach can find thin clusters. What’s more, it eliminates the strict claim of density peaks. To display our efforts, we benchmark our algorithm on the data sets draw from other methods, of which there is no unique density peak for each cluster. Our technique gets partitions of these data sets as well as that generated by the methods proposed in the papers where the data set was designed. And it’s easier to deterimine the parameters.

Keywords/Search Tags:

clustering, density peaks, decision graph, k-nearest neighbor graph, hierarchical clustering, similarity, closeness, density, distance form points of higherdensity

PDF Full Text Request

Related items

1	The Research And Application Of Density Peaks Clustering
2	Research And Improvement On Density-Based Clustering Algorithm
3	The Research Of Optimized Density Peaks Clustering And Its Distributed Algorithms
4	Research On Improved Density Peak Clustering Methods Based On K-nearest Neighbors
5	Optimization Research Of Density Peaks Clustering Algorithm Based On Neighbor Searching
6	Improvement And Application Of Density Peaks Clustering
7	Research On Density Peaks Clustering
8	The Research And Application Of Spectral Clustering Algorithm Based On Neighbor Similarity Graph
9	Manifold Density Peak Clustering Algorithm And Its Application Of Weibo Text Classification
10	Research On Manifold-based Density Peaks Clustering Algorithm