Font Size: a A A

Research On New Clustering Validity Index Based On Improved Clustering Algorithm

Posted on:2020-10-18Degree:MasterType:Thesis
Country:ChinaCandidate:B B ZhuFull Text:PDF
GTID:2428330575465387Subject:Engineering
Abstract/Summary:PDF Full Text Request
As an unsupervised learning method,clustering analysis is an important tool for obtaining data information,It has been widely used in data mining,pattern recognition,image processing,machine learning and many other fields.Due to features of simplicity and effectiveness,K-means is one of the most popular implementation of the dividing clustering algorithm.However,due to different settings of parameters and random selection of initial clustering centers,the traditional K-means algorithm is not stable which may produce different clustering partitions for a single dataset.Clustering validity index(CVI)is an important method for evaluating the effect of clustering results generated by clustering algorithms,but most clustering algorithms cannot be determined for the optimal number of clusters(Kopt).Therefore,many researchers have proposed many new cluster validity index(CVI),but most CVIs have several problems:it has instability of clustering results,low efficiency,and cannot properly deal with non-spherical distributed datasets and datasets with large number of overlapping points.In order to deal with the above problems,the paper firstly improved the traditional K-means algorithm,and proposed two new cluster validity index based on different clustering algorithms.In general,this paper mainly made the following work:1.Because of the traditional K-means algorithm have a instability clustering result due to random selection of initial clustering centers,and an improved D-K-means algorithm based on dynamic average distance is proposed.Then,many datasets was selected for experimental comparison,and the results show that the improved algorithm is more stably and more accurately.2.In the process of deal with some datasets with large number of overlapping points and large density difference between sample points,the traditional CVIs may have an instable clustering result,this paper proposes a new clustering validity index(NCVI)based on the hierarchical clustering algorithm and minimum generation tree.Then,experimental compared with the other six commonly used CVIs in four simulation datasets and two UCI real datasets,the results show that the proposed index is more stably,and the clustering results are more accurately.3.Because of the traditional CVIs in the process of clustering have an instability clustering result due to the volatility of index.In additional to,NCVI index may have some bad effects in dealing with non-spherical distributed datasets.Due to the above shortcomings,a new clustering validity index(DCVI)is proposed based on the linear combination of the intra-cluster compactness and the inter-cluster separation,it use method of dynamic distance finding the dynamic average of the sample points between all clusters,The purpose of this is to prevent multiple maximum and minimum points from being generated.It not only improves the stability of the index,but also expands the range of application.4.A new K-value optimization algorithm(KVOA)for quickly finding the optimal cluster number is designed by combining the improved K-means algorithm with the newly proposed index.The traditional clustering algorithm sets the optimal clustering value(Kopt)at the beginning of clustering,it will have a great impact on the clustering result.Therefore,this paper designs a K-value optimization algorithm based on the newly proposed index to determine the optimal clustering index more accurately.5.For each different characteristics of each clustering algorithm,some algorithms perform fast(partitioning algorithm),while others have a stable clustering result(hierarchical algorithm).Therefore,this paper proposes an extended K-value optimization algorithm(EKVOA)based on different clustering algorithms.The algorithm can not only process conventional datasets,but also handle many UCI machine learning datasets(Haberman,Heart,Energy efficiency and so on)with high dimensionality.Finally,the improved algorithm and two new cluster validity index(DCVI and NCVI)are tested by using six simulation datasets and many UCI machines learning datasets.Experimental results show that the improved D-K-means algorithm has higher accuracy and stability than the traditional K-means algorithm.The newly proposed DCVI index is more better than the other six existing CVIs in terms of stability and scope of application.
Keywords/Search Tags:K-means algorithm, cluster validity index, optimal cluster number, K value optimization algorithm
PDF Full Text Request
Related items