Font Size: a A A

Design and evaluation of clustering criterion for optimal hierarchical agglomerative clustering

Posted on:2002-10-29Degree:Ph.DType:Thesis
University:University of MinnesotaCandidate:Jung, YunjaeFull Text:PDF
GTID:2468390011991901Subject:Computer Science
Abstract/Summary:
Clustering techniques have been broadly used in many areas to retrieve meaningful data patterns hidden in unknown data structures. Even though more effective and efficient clustering algorithms have been recently developed, most of them still suffer from the problems associated with uncertainty of clustering optimality. This thesis aims to design a clustering criterion to resolve the main problem, i.e., uncertainty of clustering optimality. The criterion has been designed to work for hierarchical agglomerative clustering methods and help them find their own optimal clustering. Furthermore, it can be used to estimate the desired number of clusters for partitional clustering algorithms. In particular, use of the criterion does not depend on a particular clustering algorithm since a priori parameters are not required.; The criterion is based on the squared error method that has been widely used as an evaluation measure for clustering techniques. By using the traditional concept of entropy, we interpret clustering as a seeking process that discovers an optimal configuration at the minimum clustering entropy. The existence of clustering optimality has been proved in multidimensional Euclidean metric space using opposite concept of entropy, clustering gain. The minimum entropy implies the best tradeoff between two competing trends, i.e., intra-cluster and inter-cluster error sums. Optimal clustering can be achieved when the hierarchical agglomerative clustering algorithms stop building the dendrogram at the global minimum of clustering entropy. The number of desired clusters and initial centroids of the clusters can be estimated according to the best configuration among many optimal configurations, and they can be provided to non-hierarchical partitional clustering methods. Experimental results convincingly illustrate that the popular partitional clustering algorithms successfully converge to their optimal clustering configurations very quickly given the estimated number of clusters and initial centroids. In addition, a new weighting scheme with dimension-compression technique that improves retrieval effectiveness and classification performance is also presented. Therefore, our clustering criterion provides a promising technique for achieving higher level of quality for wide range of clustering techniques.
Keywords/Search Tags:Clustering, Optimal
Related items