Research On Partitional Clustering Algorithms For Mixed Data

Posted on:2014-01-17

Degree:Master

Type:Thesis

Country:China

Candidate:Q Q Chang

Full Text:PDF

GTID:2268330401477056

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Clustering analysis is an important field in the application of data mining and machine learning. For the reason of providing help for data structure of the exploration of unknown in the absence of the condition of prior information, it has already become a kind of data granularity and the important tool of information compression. In the driving of practical application, the researchers have proposed a variety of clustering algorithm. In marketing, information retrieval and classification, image and video processing, bioinformatics and social networks, the clustering analysis has played an important role in such fields. However, most of the proposed clustering algorithms can be only used for numerical data or categorical data, and are not very effective for mixed data described by numerical and categorical attributes at the same time. In the field of practical application, it is more common to see mixed data. Therefore, to analyzing the clustering for mixed data both in the theory and the algorithm level is still a challenging field.From the perspective of the accuracy improvement and consumption reducing, this thesis analyzes advantages and disadvantages of the clustering algorithm dealing with mixed data, and investigates the problems of clustering for mixed data under the framework of k-prototypes algorithm. In order to make up the deficiencies of clustering centers for categorical data, a new representation, named multi-modes, is given firstly. In order to reflect the dissimilarity between the objects and clusters more accurately, the Euclidean distance is generalized to deal with mixed attributes. Therefore, a partitional clustering algorithm for mixed data is proposed.The main work of this thesis includes the following contents:(1) The research background and significance, the state of the art of cluster analysis both national and international are introduced briefly.(2) The basic concepts of clustering and data types are introduced firstly. And then the analysis of a few kinds of primary algorithms in clustering analysis and the applications of cluster analysis are focused on.(3) From the perspective of the data processing and the advantages and disadvantages of the algorithms, the proposed clustering algorithms for mixed data are analyzed.(4) Based on the new representation of clustering centers for categorical data and the generalized Euclidean distance, a partitional clustering algorithm for mixed data is proposed. The effectiveness of the algorithm has been verified by Experimental on synthetically generated data sets and UCI data sets.

Keywords/Search Tags:

data mining, cluster analysis, mixed data, dissimilarity measure, K-Prototypes algorithm

PDF Full Text Request

Related items

1	Research On K-Prototypes Algorithm Based On Mixed Data And Implementation Of Spark Platform
2	Research On Partitioning Clustering Algorithms For Data With Mixed Numerical And Categorical Attributes
3	The Research On Clustering Algorithm For Categorical Data Using Quantum Mechanics
4	Dissimilarity Measure Based On The Free Parameters Of Data Mining Research
5	Clustering Algorithm Of Missing Data Based On Dissimilarity Measure
6	The Research On Clustering Algorithm For Mixed Numeric And Categorical Values Based Partitioning Methods
7	Research And Application Of Clustering Analysis Algorithm Based On Mixed Attribute Data
8	Research On Clustering Algorithms For The Data With Multidimensional Mixed Attributes
9	Based On Cluster Analysis Of The Data Mining Algorithm
10	Study On Partitioning Clustering Algorithms Based On Mixed Data