Font Size: a A A

Research On The New Validity Index Of Internal Clustering And The Method To Determine The Optimal Cluster Number

Posted on:2022-07-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y SunFull Text:PDF
GTID:2518306542963099Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As an important machine learning method,clustering algorithm has been widely used in many fields of data classification.People can complete user data mining,graphic pattern recognition,image segmentation and other work with clustering algorithm,which has achieved good results.However,due to the unsupervised learning characteristic of clustering algorithm,it becomes an important research content to evaluate the quality of clustering results.At the same time,many clustering algorithms must set the number of clusters of the target data set before running,but this number is often not known in advance.Centering on the above two problems,this dissertation classifies and deeply analyzes on the existing internal clustering validity indexes based on the construction principle,and mainly completes the following research work:1.Aiming at the problems of outliers and unbalanced data sets,(mpdist,midpoint-involved distance)was defined to measure the separation between clusters.2.A new mpdist-based clustering validity index named(MPC,mpdist-based clustering validity index)was proposed to evaluate the clustering effect of various clustering algorithms on various types of data sets effectively.Thanks to the linear time feature,MPC index can process large real data sets quickly.3.Based on the principles and ideas of(AHC,Agglomerative hierarchical clustering) algorithm for data processing,MPC index are embedded into the operation process of AHC algorithm in a new iterative way.In this dissertation,a new algorithm(AHMPC,AHC and MPC based cluster algorithm)is proposed to determine the(Kopt,optimal number of clusters)of a target dataset efficiently.In order to reduce the time cost that clustering algorithm runs repeatedly to determine Kopt of a target dataset,the new algorithm is more suitable for large datasets.Experimental results on 40 different types of data sets show that the clustering method proposed in this dissertation is stable and effective.
Keywords/Search Tags:Cluster analysis, Cluster algorithm, Cluster validity index, Optimal cluster number
PDF Full Text Request
Related items