Font Size: a A A

Research Of New Clustering Validity Index In Cluster Analysis

Posted on:2019-05-06Degree:MasterType:Thesis
Country:ChinaCandidate:P LiFull Text:PDF
GTID:2348330545998850Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In the field of data mining,cluster analysis is an important tool for data processing.It has a wide range of applications in image processing,e-commerce,biology,geographic information and other aspects.Clustering is part of unsupervised machine learning.Therefore,the clustering algorithm can be used to divide the training data set into K clusters in the case that the information of the training sample mark is unknown.However,most of the clustering algorithms have a fatal flaw:the pre-determination of the optimal clustering number Kopt is a prerequisite for effective clustering,clustering belongs to unsupervised learning,so,it will be very difficult and challenging to measure whether good or bad the datasets fit the clustering algorithms and determine the optimal cluster number Kopt for the dataset.At present,Cluster Validity Index(CVI)is an important tool to solve the above problems.This thesis also proposes two new CVIs from different perspectives.In response to the two relevant proposed CVIs,this thesis has done some works as follow:1.The problem of unstable clustering results for traditional K-means algorithm.the thesis proposes a K-means algorithm based on density parameters to select the initial cluster centre.Since the variance can be applied to measure the discrete degree of difference sample points in statistical characteristics,this thesis proposes a new clustering validity index(referred to as:VCVI index).This thesis conducts the reasoning explanation on rationality of experience rules(?)by combining VCVI and spatial fractal geometry knowledge.2.This thesis proposes a K value optimization and determination algorithm based on VCVI according to combination of K-means algorithm based on the density parameters to select the initial clustering centre point with the new VCVI.For some datasets with large amount of data,the VCVI index is more effective than some commonly used CVIs and the clustering measure of VCVI index is more effective and the optimal clustering number Kopt is more efficient.3.For some non-spherical distributions,the number of samples and the density of different clusters vary greatly,and the sample space is more complicated.The VCVI cannot carry out a relatively good performance measurement based on the clustering results.Therefore,this thesis also makes use of the knowledge of minimum cost spanning tree and European geometry to propose a new validity index(also referred to as:MSTI index).4.This thesis also proposes a Kopt value determination algorithm based on the MSTI index within the combination of Average-Linkage hierarchical clustering algorithm and the index MSTI.Compared with the VCVI index and other CVIs,the MSTI index has better clustering performance for clusters with non-spherical distributions and clusters which have comparatively large difference in sample amount as well as density.The experimental results show that the two new CVIs proposed in this thesis both have good stability,robustness and good clustering performance.
Keywords/Search Tags:Cluster analysis, Optimal clustering number Kopt, Cluster validity index, Variance, Minimum cost spanning tree
PDF Full Text Request
Related items