Font Size: a A A

The Research Of Clustering Based On Rough Set Theory

Posted on:2008-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:J B ChenFull Text:PDF
GTID:2178360215996698Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The data mining technique is a combination of machine learning, database and Statistical theory. Data mining can seek interesting or valuable information within large, incomplete, noisy, rough, and random databases. Cluster analysis is an important research problem in the domain of data mining. The goal of clustering is to classify data set into such clusters that intra-cluster data are similar and inter-cluster data are dissimilar without any prior knowledge, which is very different from data classification. So clustering is also known as"unsupervised classification". Cluester analysis as a module in the system of data mining can be used not only as a separate technique to discover the information about data distribution, but also as the preprocessing of other data mining operations, therefore it is very meaningful to research how to improve the performance of clustering algorithms.Rough set theory(RST) is a mathematical tool used for dealing with vagueness and uncertainty which is introduced by Pawlak. in the early 1980s. Rough set theory is good at analyzing the facts hidden in the data without any additional knowledge about the data. Due to its particular advantages, rough set theory has been received more and more attentions from researchers and applied in a variety of areas in recent years. In the domain of data mining, rough set was only used for classification at the beginning, but the research about rough set has already expanded to any aspects of data mining today.This thesis introduces the conception and main methods of data mining at first, especially analyses and compares every kind of clustering algorithms detailedly, then a improved clustering algorithm based on hierarchical method is given. This thesis studies rough set theory carefully and develops an attribute reduction method based on algebraic operation. Because rough set is good at dealing with incomplete and uncertain information, we introduce it into clustering in order to improve traditional clustering methods, then the positive effect of this improved algorithm is proved by experiment. This thesis analyses the relationship between granularity and clustering at last, while the application of rough set in clustering is researched in the frame of granularity. A clustering algorithm based on granularity is presented, then the experiment is implemented on two data sets from UCI database. The result shows that compared whit the clustering algorithm without granularity conception, the clustering algorithm based on granularity has more precise classify. It proves that we use rough set theory in clustering in the frame of granularity can improve the cluster quality obviously.
Keywords/Search Tags:Data Mining, Clustering, Rough Set, Attribute Reduction, granularity
PDF Full Text Request
Related items