Font Size: a A A

A categorical data clustering approach with expectation maximization and K-nearest neighbour

Posted on:2004-07-31Degree:M.ScType:Thesis
University:University of Windsor (Canada)Candidate:Liu, YuFull Text:PDF
GTID:2468390011460172Subject:Computer Science
Abstract/Summary:
In data mining, clustering analysis is an important research area. The goal of clustering is to group the objects in a data set into meaningful subclasses. Many algorithms have been designed for numerical data clustering and categorical data clustering respectively. However, very few people paid attention to the clustering problem of mixed-type data set which includes data objects that are of both numerical and categorical attributes. This thesis proposes an approach to the solution of this problem. The method is called CCEM-KNN which stands for Categorical data Clustering approach with Expectation Maximization and K-Nearest Neighbour. First, we apply a categorical clustering method over the categorical attributes of the whole data objects to get an initial partition. Then, we apply Expectation-Maximization classification algorithm based on this partition over the numerical attributes of each cluster to create a sample data set. Finally, we apply another classification algorithm K-Nearest Neighbour to perform classification which is based on the sample data set we created. In this way, we finally solve the mixed-type clustering problem. Experiment show that CCEM-KNN performs better than previous work and can also handle large data set well.
Keywords/Search Tags:Clustering, Data set, Expectation maximization and k-nearest neighbour
Related items