Font Size: a A A

Research On Partitioning Clustering Algorithms For Data With Mixed Numerical And Categorical Attributes

Posted on:2016-07-29Degree:MasterType:Thesis
Country:ChinaCandidate:G ShenFull Text:PDF
GTID:2308330467472724Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Clustering analysis is one of the main technologies of data mining, due to its capability of exploring the potential structure of data and classifying data automatically, this technology has been widely used in both academia and industry. Most of the existing clustering algorithms can either handle data with numerical or categorical attributes, but the real world data sets are described by mixed type attributes mostly. The two types of attributes’value are quite different, traditional clustering algorithms cannot effectively process data with both of them. So, research on mixed attributes data clustering has always been one of the hotspots in clustering analysis. This paper investigates partitioning clustering algorithms for data with mixed numerical and categorical attributes, and proposes two new algorithms.Based on fuzzy K-prototypes, an attributes weighted fuzzy K-prototypes algorithm (AWFKP) is proposed. This algorithm combines the idea of fuzzy membership, fuzzy centroid and attributes weight. Firstly, fuzzy centroid is employed to represent the center of categorical attribute. Secondly, based on the idea of attributes co-occurrence, a new attributes weighted dissimilarity measure is designed. At last, the performance of proposed algorithm is demonstrated by experiments on UCI benchmark data set. The experiments results show that new algorithm can get better clustering result than traditional algorithms.A genetic algorithm based K-prototypes (GAKP) clustering method is proposed. In which K-prototypes is applied for local searching under the framework of genetic algorithms. A new partition similarity based fitness function is designed; Random generation and random selection method are employed for initialization; Roulette wheel and elite strategy are used for selection; Simulated binary crossover and single-point crossover are used for cross operation; Polynomial mutation and equal probability mutation are applied for mutation operation. At last, performance of the proposed algorithm is demonstrated by experiments on UCI benchmark data sets. Experiments results show that new algorithm is more robust than traditional algorithms.
Keywords/Search Tags:mixed data, data mining, cluster analysis, K-prototypes, attributesweighted, genetic algorithms
PDF Full Text Request
Related items