Font Size: a A A

Research On Cluster Validity Indices For Categorical Data Clustering

Posted on:2014-12-26Degree:MasterType:Thesis
Country:ChinaCandidate:M ZhangFull Text:PDF
GTID:2268330401462540Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Clustering is a method of unsupervised Machine-learning technique. It can partition the data which are out of order into series of clusters. The members within it are similar to each other. So it could bring more conveniences to the following data processing. Clustering has been widely used in bioinformatics, psychology research, business analysis, and text processing. Although clustering is a mature technique,it has many problems which are needed to be solved.Clustering Validity is a critical step of clustering analysis, is also an important subject for Machine-learning. We can determine the clustering tendency and the number of clusters of the dataset. There are so many different kinds of clustering algorithms and data that there is not an index to deal with any kind of them. So we must know most of the indices to select some of them or put forward some new indices to cope with the questions that we are facing.Clustering validation indices are always for the data which are made of numerical attributes, but there are so many categorical data in practice that some of the traditional indices can not be used any more. So we have changed some of the indices to adapt to different kinds of problems.We have tested three different indices on four categorical attribute datasets, and analyze the results to find that our conclusion can be proved to be true and can meet our needs on the whole.
Keywords/Search Tags:cluster validity, categorical attribute data, data mining
PDF Full Text Request
Related items