Font Size: a A A

The Research On Knowledge-Driven Fuzzy Clustering Algorithm

Posted on:2011-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhaoFull Text:PDF
GTID:2178360302999293Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
Clustering is a broadly accepted synonym of fundamental endeavors aimed at finding patterns in data. In this study, we discuss an issue of exploiting some auxiliary hints being available as a part of domain knowledge and effectively incorporating them into the pattern recognition problem at hand.First of all, a new knowledge-driven clustering algorithm named Proximity Affinity Propagation (P-AP) is introduced. It makes use of the predefined criterion and the proximity hints given by users to modify the similarity matrix. This kind of strategy makes the clustering process more flexible to some specific problems because it involves the analyzer's knowledge.Secondly, a kind of Large Sample Clustering Algorithm (LSCA) is proposed for dealing with the problem that it is hard to get the prescribed number of clusters through the above algorithm and the problem of clustering a large sample data set. It can be regarded as the combination of Fuzzy C-Means (FCM) and Affinity Propagation (AP). There are two stages in this algorithm. At first stage, a distributed computing strategy is constructed by dividing the original data set into several data subsets, and then the exemplars (centroids) of each data subset are discovered with Affinity Propagation. At second stage of the algorithm, all the exemplars discovered at pervious stage are treated as the elements of one single set, and Fuzzy C-Means can be applied to them to produce some clusters, whose number is predefined by analyzer. At that moment, the samples which belong to any exemplar at first stage are arranged into the same cluster together with their exemplar. At this stage, fuzzy entropy as a kind of auxiliary tool is introduced for measure the reliability of fuzzy partition.Some experimental studies are researched for investigating the effectiveness of the proposed algorithms. For Proximity Affinity Propagation, The artificial data set which contains a few samples, the Iris data set and the Yale face data set are clustered with P-AP separately. For Large Sample Clustering Algorithm, the experiments on the Iris data set and the Shuttle data set are studied. Experimental results indicate that both of algorithms are easy and adaptable for evaluation, also have gained a good cluster analysis effect.
Keywords/Search Tags:Fuzzy clustering, Proximity hints, Fuzzy C-Means (FCM), Affinity Propagation (AP), Large Sample Clustering
PDF Full Text Request
Related items