Font Size: a A A

Research On K-modes Clustering Algorithm Of Dissimilarity Measure

Posted on:2013-06-01Degree:MasterType:Thesis
Country:ChinaCandidate:C T ChenFull Text:PDF
GTID:2248330371990214Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Clustering analysis is an important research content of data mining. Large number of complex data sets is effectively divided into a series of classes. The objects in the same clustering are similar, while in the different clustering the objects are different. In many clustering methods, partitioning method is the most common clustering method, especially the classical k-means algorithm. The k-means algorithm is widely used in various industrial and scientific fields. K-means algorithm shows good in processing numerical data and has good clustering results, while it can not handle character data. Therefore, the exploration and improvement of clustering algorithm based on character data is one of the important topics in the field of clustering analysis.To address the problem that k-means algorithm can not handle character data, k-modes clustering algorithm extended to it. In this paper, we studied the problem of character data clustering, and compared a variety of improved k-modes algorithm. However, the dissimilarity measure of the classical k-modes clustering algorithm can not reflect the potential similarity relations between objects, and when the data set is great and complex, the method can not distinct the differences of objects. To solve such problem, we improved the dissimilarity measure, aiming at strengthening the differences between objects. Finally, we proposed a new k-modes clustering algorithm.The contribution of this paper is as follows:(1) We summarized the background of our research and summed up the related work based on partitioning method of clustering.(2) We introduced the classification of traditional clustering method, and described the content of clustering analysis, including data structure, dissimilarity measure and the clustering criterion function.(3) We gave a detailed description on the idea and process of k-modes, and analyzed the advantages and disadvantages on it.(4) To address the problem that the original dissimilarity measure can not reflect the similarity within sub-class, we defined a function based on attribute value. The function described the importance of attribute value on attribute and the level of representation for the clustering center on attribute, and quantified the inherent relationship between objects and attributes. We proposed our dissimilarity measure method based on the function. The method reflects the level of dissimilarity of different objects at the same attribute value, and strengthens similarity within the sub-class.(5) Combining with the new dissimilarity measure, we improved the k-modes algorithm. The experimental results show that the new k-modes algorithm is superior to the classical k-modes algorithm and the Ng’s k-modes algorithm to some extent.
Keywords/Search Tags:clustering analysis, k-modes clustering algorithm, dissimilaritymeasure, categorical attribute
PDF Full Text Request
Related items