The Research On Clustering Algorithm For Mixed Numeric And Categorical Values Based Partitioning Methods

Posted on:2011-08-31

Degree:Master

Type:Thesis

Country:China

Candidate:W Chen

Full Text:PDF

GTID:2178360308469042

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Cluster analysis is one kind of important data mining technique, and it's also a hot issue in data mining researches. Among the data types to be clustered, data with mixed numeric and categorical values is the most common one whose category property value is limited, disorder and not able to be compared. These characteristics lead to many problems. For instance, no rational dissimilarity degree can be used to describe the differences between samples. Otherwise, the method of converting the category value to numeric value cannot get effective results usually. So that many clustering algorithms used to deal with numeric property are unfit for data with categorical attributes, whereas there exist few algorithms that can deal with such kind of data, and their performance, clustering quality remain to be improved. Therefore, the exploring and improving of clustering algorithms for data with mixed numeric and categorical attributes is one of the important topics in the field of cluster analysis.From the perspective of the accuracy improvement and consumption reducing, this article analyzes advantages and disadvantages of the clustering algorithm dealing with data with mixed attributes, and investigates the problems based on k-prototypes algorithm. To reduce the influence of choosing the initial clustering centers randomly, this paper introduces a new selection method based on linear model to make them respond to data set characteristics more effectively. On the other hand, the existing dissimilarities reflecting the distances of objects cannot use the information of clustering sets effectively, especially when the volume of data increases and the data set type is complex. Aiming at resolving these problems, this article ameliorates the formula of dissimilarity, and then designs a new algorithm to dispose data with mixed numeric and categorical values.The contents of this article are as following:(1) The background outline of the subject both national and international.(2) The analysis and contrast of a few kinds of primary algorithms in clustering analysis, and the introduction of data types along with it's disposing methods in clustering process.(3) The description and the analysis of advantages/disadvantages of k-prototypes algorithm, besides advances the choosing method of initial clustering centers and dissimilarity based on it. (4)Brings forward a clustering algorithm dealing with mixed numeric and categorical values based on improved k-prototypes algorithm,and designs a simulation experimental platform on English set by Visual C++ language implementing algorithm's code, SQL SERVER establishing database to validate the new improved algorithm's performance in the round,and the experiment results indicate that it has better stability-and higher accuracy.

Keywords/Search Tags:

Data mining, cluster, data with mixed numeric and categorical values, k-prototypes algorithm

PDF Full Text Request

Related items

1	Research On Partitioning Clustering Algorithms For Data With Mixed Numerical And Categorical Attributes
2	Research On Partitional Clustering Algorithms For Mixed Data
3	The Research Of Ant-Based Clustering Algorithm For Data Sets With Mixed Attribute
4	Research On Categorical Data Clustering Algorithms
5	Research On Cluster Validity Indices For Categorical Data Clustering
6	Research And Application Of Clustering Analysis Algorithm Based On Mixed Attribute Data
7	Novel Fuzzy Clustering Algorithm Based On Nature Inspired Computation
8	The Research On Clustering Algorithm For Categorical Data Using Quantum Mechanics
9	Research On Cluster Boundary Detecting Technology For Categorical Data
10	Research Of Clustering Algorithms For Categorical Data