Font Size: a A A

A Kind Of Efficient Clustering Validity Index And Its Application

Posted on:2015-09-13Degree:MasterType:Thesis
Country:ChinaCandidate:P L WangFull Text:PDF
GTID:2298330452958921Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Clustering is a kind of technologies which is widely used as an important part ofunsupervised pattern recognition, it has been an in-depth study and research since thelast century, and it has been widely applied in machine learning, data mining, patternrecognition, and other important research fields. The purpose of clustering is to getsome kind of intrinsic connections or rules through combining similar objects intogroup or cluster, which seems have no relevancy at all.The key task of clustering analysis is the quantitative evaluation of clusteringresults, especially to determine the optimal cluster number or a divided structure, theclustering result is decided by clustering validity. Many validity indices have beenproposed for quantitatively assessing the performance of fuzzy clustering algorithms.But so far these validity indices work with little satisfaction due to their unreasonablestructures and low efficiency in applications.In this paper, we make a fully understanding of the Clustering validity analysisprinciple, through the introduction and comparison of some typical clustering validityindices, firstly we propose a Gerschgorin disk theorem-based criterion to solve anoptimal number of clusters. And then two new clustering validity indices are givenbased on k-means algorithm and FCM algorithm. The main contributions and resultsare as follows:1.On the basis of analyzing the principle of clustering validity analysis, weplace an emphasis on the process of clustering validity analysis, and summarize theprinciple and application of some clustering validity indices, such as Xie-Beni index,DB index, PB index and entropy index, and then we compare the commonly usedclustering validity indices from the perspective of operating rate and clusteringaccuracy.2.We propose a Gerschgorin disk theorem-based criterion to solve an optimalnumber of clusters. The fuzzy clustering results first consist of a correlation matrix,and then the eigenvalue decomposition is performed to obtain all eigenvalues andeigenvectors of the matrix. Finally in terms of the classical Gerschgorin disk theorem,the optimal number of clusters is estimated. 3.A invariant which can exist in any data sets is proposed. Combining the useof c-means algorithm and FCM algorithm, we put forward two clustering validityindices which are used to evaluate the hard clustering results and the fuzzy clusteringresults respectively,and analyze their characteristics also. Through two experiments,the correctness, generality and time efficiency of these two indices are tested.
Keywords/Search Tags:Clustering Technology, Clustering Validity Analysis, ClusteringValidity Index, Gerschgorin Disk Theorem, The Hard Clustering, The FuzzyClustering
PDF Full Text Request
Related items