Font Size: a A A

New Non-hierarchical Clustering Objetives And The Algorithms To Optimal Clustering

Posted on:2007-09-20Degree:MasterType:Thesis
Country:ChinaCandidate:L J HuangFull Text:PDF
GTID:2178360185961158Subject:Crop Genetics and Breeding
Abstract/Summary:PDF Full Text Request
The cluster analysis is to partition n individuals or variables into g different groups (clusters) according to their similarities. This is one of the most widely used multivariate analysis methods. There are different cluster analysis types when clustering rule and procedure are different. Hierarchical and non-hierarchical (dynamic) clustering methods are two most important ones. There may be different partitioning results for the same research data with different classifying methods, so, selecting suitable clustering method is the base of optimal clustering. In this thesis, some leading clustering methods were summarized and introduced, and their characteristics and limitations were analyzed. Hierarchical clustering has apparent shortcomings, and it has little room for further development. So, it is not a recommending method for practical uses. Dynamic clustering method has much better performance in general. It can reach optimal clustering in some degree of extent. However, there were only two existing methods, among them, the k-means had limited ability towards optimal clustering. Developing new dynamic clustering methods is one of main aims in our study. This article proposed 2 new objectives of non-hierarchical clustering, namely, minimal total distance within clusters, min(TDw), and minimal total squared distance with clusters, min(TSDw). Thses new non-hierarchical cluster methods are useful in different clustering needs.As existing dynamic clustering methods are sensitive to initial partitions, it means that the final results are initial dependent. Providing optimal initials is not a recommending strategy. The key to optimal clustering is to establish an algorithm to realize optimal clustering under any initial partitions.Another aim of the thesis is to set up an algorithm to realize optimal clustering with high degree of confifence. The new algorithm includes three major procedures: contraction, expansion, dividing and merging procedure. The contraction procedure achieves an at least local (maybe global) optimal clustering. Expansion procedure can leap out off local optimum, so as marching towards global optimum. Dividing and merging can correct possible wrong partitions, achieving optimal clustering with high efficiency. The program based on Matlab was established and it was tested for various data types. Comparios analyses indicated that new algorithm had very satisfying performance for all experiment data types.The number of cluster that a given data should partition is also an unresolved problem. The article made the comparion analysis for various determining criteria. It is found that the imitated BIC might be a suitable criterion to determine the number of clusters for a data set. Thus, determining the cluster number with objective criterion is better than subjective one, making the clusters solutions nearer to its nature.The article has following main contents and conclusions:(1) Proposed new non-hierarchical clusering methods, minimal total distance within clusters, min(TDw), and minimal total squared distance with clusters, min(TSDw). Thses new non-hierarchical cluster methods are useful in different clustering needs.(2) Established new algorithm towards global optimal clustering. The new algorithm includes three major procedures: contraction, expansion, dividing and merging procedure. The contraction procedure achieves an at least local (maybe global) optimal clustering. Expansion procedure can leap out off local optimum, so as marching towards global optimum. Dividing and merging can correct possible wrong partitions, achieving optimal clustering with high efficiency.(3) Compiled a Matlab program which can display clustering course. For low dimensions (1-3 dimension), it can graphically show clustering process and results. It is useful for user to know the clustering process and distributive characters of the data.(4) The article presented a new criterion, which was named as imitated BIC, to determine the suitable cluster number. Compared to already existed criteria, it was found that the imitated BIC might be a more suitable criterion for determining the cluster number for a data set. Thus, determining the cluster number with objective criterion is better than subjective one, making the clusters solutions nearer to its nature.
Keywords/Search Tags:similarity, distance, hierarchical clustering, non-hirarchical clustering, k-means, min(TDw), min(TSDw), global optimal clustering, imitated BIC
PDF Full Text Request
Related items