Font Size: a A A

Research Andapplication On Determining Optimal Number Of Clusters In Cluster Analysis

Posted on:2012-03-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:S B ZhouFull Text:PDF
GTID:1118330368989488Subject:Light Industry Information Technology and Engineering
Abstract/Summary:PDF Full Text Request
Cluster analysis is the important research content in the fields of data mining, pattern recognition and machine learning. As the important method of data analysis and data understanding, the research of cluster analysis has a long history. Its importance and intersecting characteristic have been affirmed for many years. At present, with the development of artificial intelligence and data mining technology, especially with the appearance of various data in a rash, such as image data, text data, DNA data, time series data and Web data, cluster analysis has made fairly rapid progress.Although cluster analysis has got great achievements, there are many question in it. In this dissertation, the question on determining optimal number of clusters in cluster analysis is studied mainly. It is the important subject in cluster analysis, and it is the key factor to determine the quality of clustering. It is also the main task of clustering validity analysis. Centering on the question for determining optimal number of clusters in cluster analysis, this dissertation makes intensive study of clustering algorithms and clustering validity, and successfully applies the related algorithm to the problem of image segmentation. The main contents and research results of this dissertation are summarized as follows:Firstly, the background and current situation of the research subject are introduced, and the basic knowledge of cluster analysis, clustering validity and image segmentation are studied.Secondly, two methods including KMBWP and IKMS for determining optimal number of clusters in K-means clustering algorithm are proposed. KMBWP algorithm uses BWP validity index to analyze the validity of clustering results produced by K-means clustering algorithm, and to determine optimal number of clusters. IKMS algorithm improves the way of setting initial clustering centers in K-means clustering algorithm, and uses Silhouette validity index to determine optimal number of clusters. The experimental results on UCI datasets and artificial datasets indicate the effectiveness of the proposed algorithms.Thirdly, APBWP algorithm is proposed for determining optimal number of clusters based on Affinity Propagation clustering algorithm, and the method of using IGP index to determine optimal number of clusters is improved. The performances of six common indexes were compared, and the method of using IGP index to determine optimal number of clusters is improved. In order to enhance the applied scope of BWP, the definition of BWP index is improved. On the basis of redefined BWP index, APBWP algorithm is proposed to uses BWP to analyze the validity of clustering results produced by Affinity Propagation clustering algorithm, and to determine optimal number of clusters. Theoretical research and experimental results indicate the effectiveness of the proposed algorithms.Fourthly, AHBC algorithm is proposed for determining optimal number of clusters based on agglomerative hierarchical clustering algorithm. From the standpoint of sample geometry, a new clustering validity index called CSP index is designed, and CSP index could evaluate the clustering results of nonconvex structure datasets. Based on agglomerative hierarchical clustering algorithm, AHBC algorithm uses BWP validity index to analyze the validity of clustering results on convex structure datasets and uses CSP validity index to analyze the validity of clustering results on nonconvex structure datasets, then AHBC algorithm uses these indexes to determine optimal number of clusters. Theoretical research and experimental results indicate the effectiveness of the proposed algorithm.Fifthly, based on Affinity Propagation clustering and BWP index, an image segmentation algorithm is proposed. The proposed algorithm uses BWP index to analyze the validity of clustering results produced by Affinity Propagation clustering algorithm, and to determine optimal number of segmentations. On the basis of optimal number of segmentations, the proposed algorithm produces segmentation results of the gray level image. In order to decrease the time complexity of the similarity matrix and validity analysis, the method is proposed to cluster gray value instead of pixels in gray space and reflect clustering results to pixel space to produce image segmentation results. In addition, a method of using absolute value distance instead of Euclidean distance in BWP index is proposed for clustering in gray space. Experimental results of many kinds of images indicate the validity and good performance of the proposed algorithm.Finally, the work that has been done in this dissertation is summarized, and the outlook of the research work is brought forward.
Keywords/Search Tags:cluster analysis, clustering validity index, optimal number of clusters, K-means clustering, Affinity Propagation clustering, hierarchical clustering, image segmentation
PDF Full Text Request
Related items