Research On K-modes Clustering Algorithm Of Dissimilarity Measure

Posted on:2013-06-01

Degree:Master

Type:Thesis

Country:China

Candidate:C T Chen

Full Text:PDF

GTID:2248330371990214

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Clustering analysis is an important research content of data mining. Large number of complex data sets is effectively divided into a series of classes. The objects in the same clustering are similar, while in the different clustering the objects are different. In many clustering methods, partitioning method is the most common clustering method, especially the classical k-means algorithm. The k-means algorithm is widely used in various industrial and scientific fields. K-means algorithm shows good in processing numerical data and has good clustering results, while it can not handle character data. Therefore, the exploration and improvement of clustering algorithm based on character data is one of the important topics in the field of clustering analysis.To address the problem that k-means algorithm can not handle character data, k-modes clustering algorithm extended to it. In this paper, we studied the problem of character data clustering, and compared a variety of improved k-modes algorithm. However, the dissimilarity measure of the classical k-modes clustering algorithm can not reflect the potential similarity relations between objects, and when the data set is great and complex, the method can not distinct the differences of objects. To solve such problem, we improved the dissimilarity measure, aiming at strengthening the differences between objects. Finally, we proposed a new k-modes clustering algorithm.The contribution of this paper is as follows:(1) We summarized the background of our research and summed up the related work based on partitioning method of clustering.(2) We introduced the classification of traditional clustering method, and described the content of clustering analysis, including data structure, dissimilarity measure and the clustering criterion function.(3) We gave a detailed description on the idea and process of k-modes, and analyzed the advantages and disadvantages on it.(4) To address the problem that the original dissimilarity measure can not reflect the similarity within sub-class, we defined a function based on attribute value. The function described the importance of attribute value on attribute and the level of representation for the clustering center on attribute, and quantified the inherent relationship between objects and attributes. We proposed our dissimilarity measure method based on the function. The method reflects the level of dissimilarity of different objects at the same attribute value, and strengthens similarity within the sub-class.(5) Combining with the new dissimilarity measure, we improved the k-modes algorithm. The experimental results show that the new k-modes algorithm is superior to the classical k-modes algorithm and the Ngâ€™s k-modes algorithm to some extent.

Keywords/Search Tags:

clustering analysis, k-modes clustering algorithm, dissimilaritymeasure, categorical attribute

PDF Full Text Request

Related items

1	Studies On Clustering Algorithms For Categorical Data
2	Research On Subspace Clustering Algorithm On High-dimensional Categorical Datasets
3	Research On Several Improvements Of Categorical Data Clustering Algorithm
4	An K-modes Clustering Algorithm Based On Dynamic Weight
5	Research Of Clustering Algorithms For Categorical Data
6	The Optimization Research On K-Modes Clustering Algorithm
7	Design And Implementation Of K-modes Type Algorithms Based On R For Categorical Data
8	Research On Clustering Algorithm For Mixed-type Data Based On K-modes Algorithm
9	Research On Clustering Based On Attribute Characteristics For Categorical And Binary Data
10	The Research On Clustering Algorithm For Categorical Data Using Quantum Mechanics