Font Size: a A A

Research On Categorical Data Clustering Algorithms

Posted on:2008-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:S WuFull Text:PDF
GTID:2178360242978898Subject:Computer applications and technology
Abstract/Summary:PDF Full Text Request
Along with the expending of Database and the increasing of competition between companys, We need to get information to assist them to make a decision, when the database is expending and the competition between companys is increasing. Data Mining appeared. Cluster Analysis is one of the must important technologies in Data Mining. Many previous researches are only refined in the area of numerical data clustering, but a few researches on categorical data clustering algorithms.This paper makes a deeply research on categorical data clustering, and it contains some contents as follows:The initialization method of cluster algorithm has an directly influence on the result of clusters. Reseachers proposed many initialization methods, but no one is accepted broadly. Additionly, there is few initialization methods of categorical data clustering. We propose a new initialization method which is composed by the basic steps and the framework of refining. The basic steps integrate the distance and the density together. The framework of refining restricts initial centers to be chosen from more suitable sub-sample points.This paper analyzes the drawbacks of the previous fuzzy K-Modes algorithm, and proposes a novel algorithm for categorical data sets– fuzzy K-Patterns algorithm. The new algorithm induces new definitions of cluster centers and distance. The fuzzy K-Patterns can lead to better clustering results on many public data sets.The cluster validity index directly determines the distribution of cluster. This paper compares the previous validity indexes, and proposes a new validity index with new compositions, and advances a new index for categorical data set. The experimental result shows the two new validity indexes are all effective.
Keywords/Search Tags:Data Mining, Cluster Analysis, Categorical Attribute
PDF Full Text Request
Related items