Study Of Algorithms For Clustering Categorical Data

Posted on:2009-02-24

Degree:Master

Type:Thesis

Country:China

Candidate:M Wang

Full Text:PDF

GTID:2178360242997738

Subject:Pattern Recognition and Intelligent Systems

Abstract/Summary:

PDF Full Text Request

With the development of databases and Internet, the volume in data collection and storage in the recent decades grows explosively. How to analyze, explore and discover useful knowledge rapidly and efficiently from these data becomes the focus of scientists. To deal with this challenge, clustering analysis has become an active area on data mining techno sphere.In the paper, the technology of clustering analysis is introduced in detail, is disserted, involving the methods and characteristics of clustering used in data mining and the methods for evaluating the clustering results. Being of varies datasets, clustering analysis will be capable of deal with diversity of data types. The paper put emphasis on algorithms for clustering categorical data(CCA).The researches on related to categorical data, it is focused on the partition approach. At first, based on partition of categorical data: k-modes clustering algorithm and its variations are introduced with their advantages and disadvantages. On the basis of the partition similarity, a new definition for the accuracy of k-modes algorithm is presented and applied in setting up the Cooperative Learning groups. Then the fuzzy k-modes clustering algorithm is introduced and based on the attributes weighted are presented for the different contribution of each attribute of the data set to the clustering. Next, proximate k-median clustering algorithm on categorical data is involved. With a new fitness, the evolutionary strategy is used to optimize the weight matrix and the clustering accuracy based on the partition similarity is used to evaluate the clustering result. The experiment gives a better result with the soybean disease data set as the input samples.Secondly, the elementary quality of entropy is characterized. Three entropy-based algorithms are simply described. Next, gravity model is introduced into the new clustering algorithm. Our algorithm used the incremental entropy as the radius and the cluster as the quantity. The three rules of the category utility, the expected entropy and the purity were measure the result of clustering respectively. To be better contrast, k-mode and COOLCAT were used. Every algorithm was runned 10 times on the UCI datasets, which average serves as finaly results to compare with them.Following, a new subspace clustering without overlap is proposed. As a rule, there is possible to no cluster in the multidimensionality space. The sum of compactness function and separation function were served as the object function. Applied to the UCI datasets, it came in diffirent clusters in their subspace sets respectively.In a word, study of entropy-based clustering and subspace clustering on categorical data will be better developped.

Keywords/Search Tags:

clustering analysis, categorical data, entropy, subspace

PDF Full Text Request

Related items

1	ESCHCD: Entropy-based Algorithm For Subspace Clustering With High Dimensional Categorical Datasets
2	Studies On Clustering Algorithms For Categorical Data
3	Research On Subspace Clustering Algorithm For Categorical Data
4	Research And Implementation Of Clustering Method For High Dimensional Categorical Data
5	A Study On Clustering Algorithms For Categorical Data With Applications
6	Research On Subspace Clustering Algorithm On High-dimensional Categorical Datasets
7	Research On Interpretable Clustering Algorithms For Categorical Data
8	Research On Algorithms For Subspace Clustering And Outlier Mining Based-on Information-entropy
9	Research On Several Improvements Of Categorical Data Clustering Algorithm
10	The Research On Clustering Algorithm For Categorical Data Using Quantum Mechanics