The Research On Selective Cluster Ensemble Learning

Posted on:2011-10-02

Degree:Master

Type:Thesis

Country:China

Candidate:S Li

Full Text:PDF

GTID:2178360308465541

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the development of the data collection and data storage technology, the data size in machine learning is increasing. Using a single learner can not resolve some issues properly. In order to improve the generalization capability of learning systems, the method of utilizing multiple learners to solve problems is proposed. Nowadays, ensemble learning is widely used in sensor fault tolerance, handwritten character recognition, bio-certified radiation source identification, linguistics, transportation, medicine and management field.The purpose of ensemble learning is to get a highly reliable identification system, through making use of each individual learner. In other words, it produces learners with high generalization capability and high difference, then, it achieves high performance system. Generally, we usually depend on improving the clustering ability of each component learner and increasing the diversity among the component learners to improve the performance of clustering ensemble. However, there are still some disadvantages in traditional clustering ensemble methods, for example, clustering is an unsupervised learning, ignoring a small amount of labeled data in dataset, so the combination of component learner is difficult; in addition, the role of each learner in combination is fixed in clustering ensemble, that is, all learners play the same role in clustering ensemble, however, choosing many learners may be better than all.To solve the problems above, this paper pays attention to the subject of selective clusterer ensemble. The purpose of the research is to select part individual learner to improve performance of cluster ensemble. This paper improves the accuracy and efficiency of clustering by making full use of data characteristics and a small amount of labeled data. The main work of this paper is summarized as follows:Firstly, this paper introduces the basic concept of ensemble learning and several representative algorithm. It analyses of the shortcoming of clustering ensemble learning and presents appropriate solutions.Secondly, a selective ensemble clustering method based on bagging is proposed and implemented. First, the original data set is partitioned into several smaller sets equally, and then using the re-sampling technology to select data randomly, then these data will be assigned to each subset. The member learners are produced by using an improved k-means algorithm. Afterwards, mutual information is introduced to adjust the results of clustering. Finally, the controversial data are reclassified to a new cluster by calculating the distance between those data and cluster center Experimental results on 10 different UCI data sets show that the proposed method has higher cluster accuracy than simplekmeans.Thirdly, a semi-supervised clustering algorithm based on classifier algorithm is proposed in this paper. The algorithm trains a week classifier to classify original data set roughly. And traditional k-means algorithm is improved to resolve classified results, and then these results are clustered with k-meansGuider method. Finally, the cluster results are integrated. This algorithm makes full use of a few labeled data to guide initial cluster. The traditional k-means algorithm is extended by altering the selection of cluster center. Meanwhile, the algorithm is able to find freeform cluster and is not sensitive to nosy data we implement this algorithm on the weka platform, and experimental results on 15 data sets show that this method has higher cluster accuracy.

Keywords/Search Tags:

Clustering, Ensemble Learning, mutual information, k-means, ensemble clustering, Bagging algorithm

PDF Full Text Request

Related items

1	Research On Classifier Ensemble
2	The Research On Fuzzy Clustering Combination Algorithm And Ensemble Diversity Analysis
3	Research On Ensemble-Initialized K-Means Clustering Algorithms
4	Study On H-K Clustering Algorithms Based On Ensemble Learning
5	Research On The Effectiveness Element Theory And Method Of Clustering Ensemble
6	Research On Key Technologies Of Clustering Ensemble
7	Research On Semi-supervised Classification Algorithm Based On Clustering Ensemble
8	Research Of Hybrid Clustering Ensemble Approaches
9	Research On Efficient Clustering Ensemble Algorithm Based On Random Subspace
10	Study On The Clustering Ensemble Algorithm Based On Granular Computing