Research On Ensemble Learning

Posted on:2008-04-29

Degree:Master

Type:Thesis

Country:China

Candidate:F Zhou

Full Text:PDF

GTID:2178360242976753

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Recently, Ensemble Learning has been developed into a popular method in the area of machine learning. Compared with direct-learning approaches, it focuses on a framework which has the capability of combining various classifiers. It usually takes a step to decompose the problems first. By evaluating the connections among the learned classifiers, the global concept could be obtained from the partial ones.The most appealing features of problem decomposition could be categorized into the following aspects: First, as the scale and the number of samples grow in an intensive level, the traditional classifiers have no corresponding principals to couple with. Therefore, the way to decompose the large-scale problem into several small ones is considered as an appropriate solution; Then, it has been noticed the designers proposed their classifiers with a single specific consideration. If the problem lies in the right hypothesis space, the performance will naturally be excellent. In most cases, however, it is unreasonable to make such a strong assumption. A more proper way is to understand the problem in different views; Besides, The existence of noise is unavoidable in the real environment. Without distinguishing them, we will obtain an over-fitting model otherwise. We need a proper way here to identify the noisy parts which have to be discarded in the decision procedure.Ensemble learning have been successively applied into the multi-class problems. It mainly arranges the samples into clusters according to the class boundaries, which are then combined via the one-vs-one or one-vs-all ways. The final labels could be decided in a voting procedure. Especially, within an assumed probabilistic models, the relationships among the classifiers would be precisely measured (e.g. the KL distance). Unfortunately, a probability-output is not promised for most classifiers. In order to keep the core algorithms as much as possible, a Sigmod curve would be approximated instead.In the past works, M~3 has been proved as an excellent system in the large-scale and imbalanced cases. Differed with the ensemble framework only for multi-class problems, it continually splits a still hard pairwise problem. The solution of original problem will be restored by a pair of principals called Min-Max.We concluded that a reasonable cut along some prior-known edges is more inclined to obtain a better performance in later classification. However, the theoretic explanation is still far to understand well. In this article, we hope to apply the statistics knowledge in a theoretic description. The sample set is generated by some underlying probability distribution. Maybe the whole distribution is hard to learn, its parts will be more easily to understood. According to the well-known Bayes decision theory, we could obtain the optimum prediction. Our new formula shows the Min-Max principals are just equivalent forms used the classifiers which report 0-1 outputs. Moreover, the combination of the sub-sets of samples could be extended into a much wide scope, in which a fast algorithm is then proposed. The experimental results show both the time and space complexity could be reduced into linear level.

Keywords/Search Tags:

min-max modular network, large-scale and imbalanced data classification, ensemble learning, Bayes decision, support vector machine

PDF Full Text Request

Related items

1	Research On Support Vector Machine For Large Scale Imbalanced Data
2	Application Research Of Used-car Recommendation Based On Classification Method On Imbalanced Data Sets
3	Parallel Min-Max Modular Support Vector Machine With Application To Patent Classification
4	Researches On Support Vector Machine Learning Approaches Based On Ensemble Learning
5	Research On Imbalanced Data Classification Methods Based On Ensemble Learning
6	Research On Ensemble Method Of Structured Support Vector Machine For Imbalanced Data
7	Research On Binary Imbalanced Large Data Classification And Its Application
8	Machine Learning Based Patent Categorization
9	Research On Classification Method Based On SVDD
10	Research On Classification Methods For Large-scale Imbalanced Data