Font Size: a A A

The Research On Fuzzy Clustering Combination Algorithm And Ensemble Diversity Analysis

Posted on:2011-10-17Degree:MasterType:Thesis
Country:ChinaCandidate:M QiFull Text:PDF
GTID:2178360308465192Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data Mining is a process that extracts hidden, unknown, potentially useful knowledge and pattern from huge volume of data. As one of the important tasks in the field of data mining, clustering has been applied widely in various domains such as data analysis, pattern recognition, market analysis, image processing and so on. Researchers are paying more and more wide attention to clustering analysis.Under the circumstances that there are no samples to study, clustering analysis partitions the datasets automatically to make the samples in the same cluster similar, and the samples in different clusters dissimiliar to each other. Traditional clustering analysis focuses on hard partitions, whose boundaries between objects are accurate when we cluster them. That is to say, one sample just belongs to one class label. But in realistic, most samples are usually fuzzy in their classes and pattern features, there are no strict restrictions on attribute, and they fit for soft partitions.Such soft partition problem can be solved by a powerful analysis tool which is called fuzzy set theory. Fuzzy methods become to be applied into the clustering and have been called Fuzzy Clustering Analysis. It is the product of clustering analysis combining with fuzzy theory. Fuzzy clustering describes each sample's uncertainty and such uncertainty sometimes can reflect the real world better than other methods.As a novel research hotspot of clustering analysis, clustering ensemble can improve the performance of data clustering by combining different partitions which produced by different algorithms or by one algorithm in different parameters. Current researchers pay more attentions to crisp clustering ensembles, but involve little about fuzzy clustering combination. In this thesis, the purpose of the research is to improve classical fuzzy C-means algorithm and apply the characteristic of fuzzy clustering to enhance the performance of classifier ensemble, further more, considering the differences between the clustering members, we apply fuzzy clustering ensemble to improve the quality of cluster which can obtain more information than crisp clustering. The main contributions of this dissertation are summarized as follows:Firstly, a method of improved fuzzy C-means clustering algorithm (SWFCM) is presented in the paper. The shortages of fuzzy C-means clustering algorithm are that it is sensitive for data of outlier and noise, uneven distribution of samples, so we present this improved algorithm. By improving the subject function, the impact of outlier are eliminated, and in order to differentiate the different effect of different sample for knowledge discovery, every sample hold a quantificational weight for improve clustering results of noise and distributed imbalanced samples. The experimental results show that the modified algorithm is more robust and has higher clustering accuracy.Secondly, a novel two level ensemble classifier algorithm (EWFuzzyBagging) which is based on fuzzy clustering is proposed in this paper. First, fuzzy C-means method is used to cluster the instances, and then every instance obtains the fuzzy membership corresponding to its class label. The first level ensemble obtains component classifiers through bagging algorithm. The number of classifiers is equal to the number of class labels of datasets, while every component classifier corresponding to one class label. The sampling way of these component classifiers is random resample by means of weighting every instance of training set with the membership degree of the corresponding class. The second level ensemble is used to combine the component classifiers aiming at classes, which produced by the first level, through dynamic weighted majority voting, final classify result is learned. The new approach shows more robust comparing to Bagging and AdaBoost.Thirdly, an approach of fuzzy cluster ensemble based on Mutual information is proposed in the paper. First, this algorithm used the characteristic that the fuzzy C-means algorithm selected the initial clustering centers randomly to obtain cluster members which are different from each other, then these members are ensemble roughly through voting strategy to gain a rough result; second, compare these members with rough result to judge their stability according to mutual information, then set weights to these cluster members through stability; at last weighted cluster members generate the final ensemble result via voting strategy. The experimental results show that the proposed approach can takes the diversity of members'stability into account; therefore it can obviously improve the clustering performance and afford better ensemble result.
Keywords/Search Tags:fuzzy C-means, fuzzy membership, dynamic weighted, clustering ensemble, mutual information, voting strategy
PDF Full Text Request
Related items