Research On Hierarchical Clustering Algorithm Based On Silhouette

Posted on:2011-06-01

Degree:Master

Type:Thesis

Country:China

Candidate:D M Zhang

Full Text:PDF

GTID:2178360302494501

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Through analyzing the clustering algorithm situation of foreign and domain, we get the conclusion that many problems exist in the previous clustering algorithms. The finishing parameters need to be fixed in the traditional hierarchical clustering algorithms. The time complexity of the determination for the parameters is high. The existing background knowledge has not been fully utilized in the most of the hierarchical clustering algorithms. Thus, the quality of the clustering result is not good. Besides, the sequence data have been analyzed and applied in few hierarchical clustering algorithms. In order to address the problems, the paper has mainly focused on the research of the hierarchical clustering algorithm based on silhouette. Solving these problems makes significance for life sciences, medicine, social science and geographical science and so on.Firstly, a hierarchical clustering algorithm based on silhouette is proposed. In the algorithm, the number of clusters is determined by incrementally drawing the curve about the mean improving silhouette of the dataset. In the later agglomerative hierarchical clustering phase, entropy, which is considered as the new similarity measurement, is introduced. The outlier clusters is identified by calculating the weighted distance between clusters.Secondly, a hierarchical clustering algorithm based on silhouette and constraint is proposed. The existing pairwise instance-level constraints are incorporated in the proposed algorithm. The existing constraints are utilized for updating the cohesion matrix. Meanwhile, penalty factor is introduced to address the constraint must-link and cannot-link violation problem.Finally, a sequence hierarchical clustering algorithm based on silhouette in software security analysis is proposed. In the proposed algorithm, fault feature matrix is defined to reflect the relation between the fault feature and the corresponding row vector on the premise of existing sequence pattern. Thus, the clustering of sequences can be transformed into the clustering of row vectors. The match scale of software fault feature analysis is reduced through the clustering of existing fault sequence.

Keywords/Search Tags:

Hierarchical clustering, Silhouette, K-means, Entropy, Constraint, Sequence

PDF Full Text Request

Related items

1	Research On K-Means Clustering Method With Structural Constraint
2	The Study And Development Of Hierarchical-K-means-Based Clustering Algorithm
3	A Relational Framework for Clustering and Cluster Validity and the Generalization of the Silhouette Measure
4	Research On K-means Clustering Algorithm Based On Differential Privacy Protection
5	Research On Mutual Information Hierarchical Clustering Based On Grassberger Entropy Estimator
6	Fuzzy C-means And K-means Clustering Algorithm And Its Parallel
7	Study On The Application Of The Improved K-means Clustering Algorithm In Image Retrieval
8	New Non-hierarchical Clustering Objetives And The Algorithms To Optimal Clustering
9	Cluster Study Based On Functional Magnetic Resonance Imaging Data
10	Based On Entropy And Distance Weighted Multi-angle Fuzzy Clustering