Research On Partitioning Clustering Algorithms For Data With Mixed Numerical And Categorical Attributes

Posted on:2016-07-29

Degree:Master

Type:Thesis

Country:China

Candidate:G Shen

Full Text:PDF

GTID:2308330467472724

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Clustering analysis is one of the main technologies of data mining, due to its capability of exploring the potential structure of data and classifying data automatically, this technology has been widely used in both academia and industry. Most of the existing clustering algorithms can either handle data with numerical or categorical attributes, but the real world data sets are described by mixed type attributes mostly. The two types of attributesâ€™value are quite different, traditional clustering algorithms cannot effectively process data with both of them. So, research on mixed attributes data clustering has always been one of the hotspots in clustering analysis. This paper investigates partitioning clustering algorithms for data with mixed numerical and categorical attributes, and proposes two new algorithms.Based on fuzzy K-prototypes, an attributes weighted fuzzy K-prototypes algorithm (AWFKP) is proposed. This algorithm combines the idea of fuzzy membership, fuzzy centroid and attributes weight. Firstly, fuzzy centroid is employed to represent the center of categorical attribute. Secondly, based on the idea of attributes co-occurrence, a new attributes weighted dissimilarity measure is designed. At last, the performance of proposed algorithm is demonstrated by experiments on UCI benchmark data set. The experiments results show that new algorithm can get better clustering result than traditional algorithms.A genetic algorithm based K-prototypes (GAKP) clustering method is proposed. In which K-prototypes is applied for local searching under the framework of genetic algorithms. A new partition similarity based fitness function is designed; Random generation and random selection method are employed for initialization; Roulette wheel and elite strategy are used for selection; Simulated binary crossover and single-point crossover are used for cross operation; Polynomial mutation and equal probability mutation are applied for mutation operation. At last, performance of the proposed algorithm is demonstrated by experiments on UCI benchmark data sets. Experiments results show that new algorithm is more robust than traditional algorithms.

Keywords/Search Tags:

mixed data, data mining, cluster analysis, K-prototypes, attributesweighted, genetic algorithms

PDF Full Text Request

Related items

1	Research On Partitional Clustering Algorithms For Mixed Data
2	Research On Clustering Algorithms For The Data With Multidimensional Mixed Attributes
3	A Study Of The Clustering Algorithm For Mixed Data
4	The Research On Clustering Algorithm For Mixed Numeric And Categorical Values Based Partitioning Methods
5	Research And Application Of Clustering Analysis Algorithm Based On Mixed Attribute Data
6	Research In Data Mining Method Based On Genetic Algorithms
7	Research In Data Mining Based On Genetic Algorithms
8	Data Mining Technology And Its Application In The Supermarket In Crm
9	The Research & Application Of Data Mining Base On Genetic Algorithms
10	Research On Ensemble Clustering Algorithms For Complex Data