Font Size: a A A

The Research On Selective Cluster Ensemble Learning

Posted on:2011-10-02Degree:MasterType:Thesis
Country:ChinaCandidate:S LiFull Text:PDF
GTID:2178360308465541Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of the data collection and data storage technology, the data size in machine learning is increasing. Using a single learner can not resolve some issues properly. In order to improve the generalization capability of learning systems, the method of utilizing multiple learners to solve problems is proposed. Nowadays, ensemble learning is widely used in sensor fault tolerance, handwritten character recognition, bio-certified radiation source identification, linguistics, transportation, medicine and management field.The purpose of ensemble learning is to get a highly reliable identification system, through making use of each individual learner. In other words, it produces learners with high generalization capability and high difference, then, it achieves high performance system. Generally, we usually depend on improving the clustering ability of each component learner and increasing the diversity among the component learners to improve the performance of clustering ensemble. However, there are still some disadvantages in traditional clustering ensemble methods, for example, clustering is an unsupervised learning, ignoring a small amount of labeled data in dataset, so the combination of component learner is difficult; in addition, the role of each learner in combination is fixed in clustering ensemble, that is, all learners play the same role in clustering ensemble, however, choosing many learners may be better than all.To solve the problems above, this paper pays attention to the subject of selective clusterer ensemble. The purpose of the research is to select part individual learner to improve performance of cluster ensemble. This paper improves the accuracy and efficiency of clustering by making full use of data characteristics and a small amount of labeled data. The main work of this paper is summarized as follows:Firstly, this paper introduces the basic concept of ensemble learning and several representative algorithm. It analyses of the shortcoming of clustering ensemble learning and presents appropriate solutions.Secondly, a selective ensemble clustering method based on bagging is proposed and implemented. First, the original data set is partitioned into several smaller sets equally, and then using the re-sampling technology to select data randomly, then these data will be assigned to each subset. The member learners are produced by using an improved k-means algorithm. Afterwards, mutual information is introduced to adjust the results of clustering. Finally, the controversial data are reclassified to a new cluster by calculating the distance between those data and cluster center Experimental results on 10 different UCI data sets show that the proposed method has higher cluster accuracy than simplekmeans.Thirdly, a semi-supervised clustering algorithm based on classifier algorithm is proposed in this paper. The algorithm trains a week classifier to classify original data set roughly. And traditional k-means algorithm is improved to resolve classified results, and then these results are clustered with k-meansGuider method. Finally, the cluster results are integrated. This algorithm makes full use of a few labeled data to guide initial cluster. The traditional k-means algorithm is extended by altering the selection of cluster center. Meanwhile, the algorithm is able to find freeform cluster and is not sensitive to nosy data we implement this algorithm on the weka platform, and experimental results on 15 data sets show that this method has higher cluster accuracy.
Keywords/Search Tags:Clustering, Ensemble Learning, mutual information, k-means, ensemble clustering, Bagging algorithm
PDF Full Text Request
Related items