Font Size: a A A

Research On Fuzzy Clustering And Clustering Ensemble In Data Mining

Posted on:2009-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q FuFull Text:PDF
GTID:2178360245489580Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid progress of data collection and data storage technology, many large data is accumulated and the ability of useful information extracting becomes a challenge. Data mining is such a technology and show strong power.As one of the most important branches of data mining, clustering analysis divides a set of physical or abstract objects into several clusters that constitute for the similar objects. The objects in the same cluster are similar, but the ones in different clusters are dissimilitude each other.Traditional clustering analysis focuses on "hard" partitions, whose boundaries between objects are accurate when we classify them. But in the real word, most objects are usually fuzzy in their pattern and class feature, they fit for "soft" partitions. This problem can be solved by a powerful analysis tool which is called fuzzy set theory. Fuzzy methods become to be applied into the clustering and been called Fuzzy Clustering Analysis. High-dimensional data clustering is universally accepted to be a hard problem in clustering analysis. In this thesis, we present the basic concepts of fuzzy sets and fuzzy relations, summarize the fuzzy clustering principles and methods, and discuss the most frequently used algorithms. Furthermore, a novel Minimum Spanning Tree( MST) for high-dimensional sparse data object feature weighted fuzzy clustering is proposed, which can merge the high-dimensional sparse data effectively, make the clustering result close to the real data structure, and higher clustering quality can be obtained.Because of the importance and specialization of clustering analysis, great progress of this field has been made in recent years, and many clustering algorithms have appeared. Clustering ensemble is a method for combining different partitions which produced by different algorithms or by one algorithm in different parameters. It has been proved an effective method which leads to better result than single algorithm. Most researches think more about crisp clustering combination, but involve little about fuzzy clustering ensemble. In this thesis, by analyzing fuzzy set theory and existent clustering ensemble methods, a decision model for fuzzy clustering ensemble is presented. The novel algorithm firstly obtain the optimal partition which called "expert" in this paper from H individual fuzzy partitions generated by fuzzy c-means algorithm with different parameters. Then, we use fuzzy voting scheme to produce the "majority judger". Finally, the two judgers are combined by decision model. Experimental results show the effectiveness of the proposed method in artificial data sets, UCI data sets and Web data sets.
Keywords/Search Tags:Data Mining, Fuzzy Clustering, Clustering Ensemble, Decision Model, Fuzzy c-means (FCM) algorithm
PDF Full Text Request
Related items