Research And Implementation On Variable Weighting In K-means Type Clustering

Posted on:2007-06-06

Degree:Master

Type:Thesis

Country:China

Candidate:X M Li

Full Text:PDF

GTID:2178360212466979

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

So far, many clustering algorithms have been proposed, but the k-means type clustering algorithms are widely used in real world applications such as marketing research and data mining to cluster very large data sets due to their efficiency and ability to handle numeric and categorical variables that are ubiquitous in real databases.However, a major problem of using the k-means type algorithms in data mining is selection of variables. The k-means type algorithms can't select variables automatically because they treat all variables equally in the clusting process. In pratice, an interesting clustering structure usally occurs in a subspace defined by a subset of the initially selected variables in stead of the entire variables set, some noise variables hinder cluster discovery. Data in real databases, such as customer databases, are often described by a large number of attributes (variables). Selection of a proper set of variables for clustering from a real world database is a very difficult and important problem in data mining applications because variables do not contribute equally to discovery of clusters.Fistly, an automated variable weighting in k-means type clusting algorithm is implemented in this paper and an experiment conducted on a synthetic data set is presented. The W-k-means results are compared with the results from the standard k-means algorithm without variable weighting and the k-means algorithm with the fixed variable weighting to verify the good perfomence of W-k-means in identifying noise variables and discovering cluster. Secondly, in order to handle categorical variables, a new algorithm called W-k-mode based on W-k-means and k-mode is proposed and implemented. in order to handle numeric and categorical variables, a new algorithm called W-k-prototypes based on W-k-means and k-prototypes is proposed and implemented .Finally, based on the W-k-prototypes algorithm, a clusting analysis system fiting the CRISP (Cross Industry Standard Process for Data Mining) model is implemented .

Keywords/Search Tags:

data mining, clustering analysis, variables weighting

PDF Full Text Request

Related items

1	Research On Feature Weighting And Feature Selection-based Data Mining Algorithms
2	Research And Implementation Of Scenic Area Information Mining System Based On Feature Weighting And Density Clustering
3	Clustering Analysis Based On Improved K - Means Algorithm
4	Research On New Elastic Network Algorithm For Cluster Analysis
5	The Research On Clustering Analysis And Its Application In Data Mining Of Mobile Communication Enterprise
6	Technology Research, Data Mining Based On Fuzzy Clustering
7	Research Of Hybrid Clustering Algorithm For Incomplete Data Based On Local Weighting
8	Research On Clustering Algorithm Based On Data Mining And Its Application
9	Clustering Analysis In Data Mining Research And Application Of The Algorithm
10	The Research On Clustering Analysis And Its Application In Web Log Mining