Font Size: a A A

Algorithms Implementation Of Determining The Number Of Clusters And Initial Cluster Centers For Mixed Data

Posted on:2021-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:H T ZhouFull Text:PDF
GTID:2428330626455178Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Partition based method is one of the classical clustering methods.How to choose the appropriate number of clusters and the initial cluster center is a problem that all the clustering methods must face in the initialization process.Although the number of clusters and the selection algorithm of initial cluster center for single data type data with numerical or subtype attributes have been widely studied,most of the data are described by numerical and subtype attributes in practical application,also known as mixed data.We study how to determine the number of clusters and choose the initial cluster center in the mixed data partition clustering(1)Based on the weight density,a measure of the dissimilarity between the mixed data classes is given,and then an algorithm to determine the number of mixed data clusters is proposed,and the time complexity of the algorithm is analyzed.The experimental results on UCI data set show that the method is effective.(2)Based on the density of objects and the distance between objects,the traditional maximum and minimum distance algorithm is further extended.An initial class center selection algorithm for mixed data is proposed,and the method is applied to K-Prototypes algorithm.Experimental results show that the proposed algorithm is better than the random initialization method.In this thesis,the initialization method of the hybrid data partition clustering method is studied,and the number of clusters and the method to determine the initial cluster center are given respectively,which has a certain reference value for the clustering work of hybrid data.
Keywords/Search Tags:Hybrid data, clustering, number of clusters, initial cluster center, density
PDF Full Text Request
Related items