Font Size: a A A

The Research On The Method To Measure The Validity And To Abstract Knowledge Of Clustering

Posted on:2009-10-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2178360272477183Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Clustering is an unsupervised learning method. The task of clustering is to group the data by the given similarity such that the data in the same group are similar with each other and dissimilar with the ones in different group. Since there is no group information in the data set which is used to cluster, the effect of clustering cannot be measured by the traditional methods such as"training-testing". Therefore, the cluster validity indexes, which are used to check whether the result of clustering satisfies the requirement that "similar in same group, dissimilar in different group", are required to measure clustering results. However, there is no universal criterion of similarity. Sometimes, the definition of similarity in the clustering algorithm is not consistent with that in the cluster validity index, so that the value of index is useless in this case. The interpretability is an important factor to measure the clustering. The clustering should be interpreted first and then measured from the interpretation while the validity is checked. The clustering result needs to be abstracted while interpreted. There are some forms of knowledge used to describe the clusters. The representative method is a classic method to describe clusters. The effect of the clustering algorithms based on the representative method is very good. Thus, in this paper, a new clustering validity method based on representative method has been presented. This new method first abstracts the clustering results and then measures the abstract information. The basis of the measure is MDL, which is the basic principle of machine learning. The experiments show that the new method is consistent with most of the classic methods in general, and plays better in some special conditions. Moreover, this new method can also describe the structure of clusters.
Keywords/Search Tags:data mining, machine learning, clustering, clustering validity, clustering interpretation, incremental clustering
PDF Full Text Request
Related items