The Research On The Method To Measure The Validity And To Abstract Knowledge Of Clustering

Posted on:2009-10-14

Degree:Master

Type:Thesis

Country:China

Candidate:Y Li

Full Text:PDF

GTID:2178360272477183

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Clustering is an unsupervised learning method. The task of clustering is to group the data by the given similarity such that the data in the same group are similar with each other and dissimilar with the ones in different group. Since there is no group information in the data set which is used to cluster, the effect of clustering cannot be measured by the traditional methods such as"training-testing". Therefore, the cluster validity indexes, which are used to check whether the result of clustering satisfies the requirement that "similar in same group, dissimilar in different group", are required to measure clustering results. However, there is no universal criterion of similarity. Sometimes, the definition of similarity in the clustering algorithm is not consistent with that in the cluster validity index, so that the value of index is useless in this case. The interpretability is an important factor to measure the clustering. The clustering should be interpreted first and then measured from the interpretation while the validity is checked. The clustering result needs to be abstracted while interpreted. There are some forms of knowledge used to describe the clusters. The representative method is a classic method to describe clusters. The effect of the clustering algorithms based on the representative method is very good. Thus, in this paper, a new clustering validity method based on representative method has been presented. This new method first abstracts the clustering results and then measures the abstract information. The basis of the measure is MDL, which is the basic principle of machine learning. The experiments show that the new method is consistent with most of the classic methods in general, and plays better in some special conditions. Moreover, this new method can also describe the structure of clusters.

Keywords/Search Tags:

data mining, machine learning, clustering, clustering validity, clustering interpretation, incremental clustering

PDF Full Text Request

Related items

1	The Research On Several Issues Of Clustering And Clustering Validity Indexes
2	Research On Dynamic Clustering And Incremental In Data Mining
3	Study Of Fuzzy Clustering Algorithm And Its Validity
4	Clustering Algorithm In The Web Mining Applications
5	A Clustering Validity Index Based On Noise Suppr Ession And Its Application
6	Research On Clustering Algorithms In Traffic Domain
7	Research And Simulation Of Clustering Algorithm In Data Mining
8	Determination Of Optimal Clustering Number Of Mixed Data And Its Application
9	Research On Incremental Clustering Method Of News Text Based On Contrastive Learnin
10	A Kind Of Efficient Clustering Validity Index And Its Application