Font Size: a A A

Study On Partitioning Clustering Algorithms Based On Mixed Data

Posted on:2018-11-19Degree:MasterType:Thesis
Country:ChinaCandidate:J S ZhouFull Text:PDF
GTID:2348330569985099Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Clustering technology has been widely applied to various domains to explore the useful patterns inside data.Most of the current clustering algorithms are based on the data of a single attribute type,but most of them are mixed data that are characterized b y numerical attributes and categorical attributes.Therefore,the research of clustering for mixed data is still a challenging field.In this paper,we propose an improved partition clustering algorithm based on mixed data,which is based on dissimilarity measure and cluster center initialization method.Based on the PAM clustering algorithm,an attribute weighted mixed data dissimilarity measure is proposed.This method combines the Manhattan distance of the numerical data and the simple matching distance of the classification data,regards the dissimilarity measure of the numerical data as a whole,and then uses the entropy metric to determine the weight of each classification attribute,thus obtain the new mixed data dissimilarity measure.O n the basis of the PAM algorithm,the weighted dissimilarity measure and Gower dissimilarity measure are used to cluster,and compare the clustering effect with traditional k-prototypes on UCI data sets.The experimental results show that the PAM algorithm with weighted dissimilarity has better clustering effect and higher accuracy.A method of clustering center initialization based on density is proposed.First,the density of the data object is defined to measure its cohesion in the data set.Then,combining with the density and dissimilarity measure,we calculate the probability of each object becomes the cluster center,and select the k object with the highest probability as the initial cluster center.Finally,we compare the effectiveness of the proposed method and the randomized initial cluster center method in the framework of k-prototypes clustering algorithm.The experimental results show that the density-based clustering center initialization method in this paper effectively ensures the stability of clustering results.
Keywords/Search Tags:mixed data, partitioning clustering, dissimilarity weighted, cluster centers initialization
PDF Full Text Request
Related items