Font Size: a A A

Research On Partitional Clustering Algorithms For Mixed Data

Posted on:2014-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:Q Q ChangFull Text:PDF
GTID:2268330401477056Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Clustering analysis is an important field in the application of data mining and machine learning. For the reason of providing help for data structure of the exploration of unknown in the absence of the condition of prior information, it has already become a kind of data granularity and the important tool of information compression. In the driving of practical application, the researchers have proposed a variety of clustering algorithm. In marketing, information retrieval and classification, image and video processing, bioinformatics and social networks, the clustering analysis has played an important role in such fields. However, most of the proposed clustering algorithms can be only used for numerical data or categorical data, and are not very effective for mixed data described by numerical and categorical attributes at the same time. In the field of practical application, it is more common to see mixed data. Therefore, to analyzing the clustering for mixed data both in the theory and the algorithm level is still a challenging field.From the perspective of the accuracy improvement and consumption reducing, this thesis analyzes advantages and disadvantages of the clustering algorithm dealing with mixed data, and investigates the problems of clustering for mixed data under the framework of k-prototypes algorithm. In order to make up the deficiencies of clustering centers for categorical data, a new representation, named multi-modes, is given firstly. In order to reflect the dissimilarity between the objects and clusters more accurately, the Euclidean distance is generalized to deal with mixed attributes. Therefore, a partitional clustering algorithm for mixed data is proposed.The main work of this thesis includes the following contents:(1) The research background and significance, the state of the art of cluster analysis both national and international are introduced briefly.(2) The basic concepts of clustering and data types are introduced firstly. And then the analysis of a few kinds of primary algorithms in clustering analysis and the applications of cluster analysis are focused on.(3) From the perspective of the data processing and the advantages and disadvantages of the algorithms, the proposed clustering algorithms for mixed data are analyzed.(4) Based on the new representation of clustering centers for categorical data and the generalized Euclidean distance, a partitional clustering algorithm for mixed data is proposed. The effectiveness of the algorithm has been verified by Experimental on synthetically generated data sets and UCI data sets.
Keywords/Search Tags:data mining, cluster analysis, mixed data, dissimilarity measure, K-Prototypes algorithm
PDF Full Text Request
Related items