Research On Several Key Problems Of Ensemble Learning Algorithms

Posted on:2012-06-27

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Q Wang

Full Text:PDF

GTID:1118330335492125

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Classification is one of the most important tasks in machine learning and data mining, and is widely used in real world applications. For example, it can be used to judge whether an email is junky or not according to its title and content; to judge whether a patient is positive or not according to his or her clinical measurements, etc. Many classification algorithms have been proposed, such as decision tree, bayesian network, neural network, support vector machine, etc.Ensemble learning algorithms train multiple base learners and then combine their predictions to make the final decision. Since the generalization ability of an ensemble could be significantly better than that of a single learner, studying the methods for constructing good ensembles has attracted a lot of attentions in machine learning literature during the past decade.It is widely recognized that in order to build a strong ensemble, the component learners should be with high accuracy as well as high diversity. Many ensemble algorithms have been proposed, such as Bagging, AdaBoost, Random Subspace, Random Forest, etc. These methods have attracted a lot of attentions and been successfully applied in many real world applications. However, there still existing many unresolved problems, such as:Bagging could only improve the performance of unstable classifiers; AdaBoost is tend to overfit the training data and can't be parallelized; Random Subspace only works for data which contains many redundant features; for some specific learning algorithms, such as naive bayes and support vector machine, the improvement of all these ensemble methods is trivial, etc. All these problems indicate that we still need to investigate and design other new ensemble methods.In this thesis, we mainly concentrate on designing effective ensemble methods based on jointly manipulating the input attributes and class attribute. In addition, we provide the online updating formulae for the generalized inverse of centered matrix. The main contribution of this thesis includes:1. We propose a new method, namely MTForest, to ensemble decision tree learning algorithms based on multi-task learning. It works by enumerating each input attribute as extra task to introduce different additional inductive bias to generate diverse yet accurate component decision tree in the ensemble.2. We propose a novel general ensemble method, namely MACLEN, based on manipulating the class labels which can be applied to both binary and multi-class learning problems. It generates different biased new class labels through the Cartesian product of the class attribute and each input attribute, and then builds a component classifier for each of them.3. We propose a novel randomization based ensemble method for bayesian network classifier which could effectively relax the conditional independence assumption of naive bayes while still retain much efficiency. Unlike existing optimization-based structure learning strategy, it generates the structure of each component classifiers completely random, i.e. randomly choosing the parent of each attribute in addition to class attribute. Its main property is that it could avoid the computational extensive structure learning process, relax the conditional independence assumption and lead to improved stability by combining multiple learners.4. In addition, we present the exact updating formulae for the generalized inverse of centered matrix, when a row or column vector is inserted or deleted. The computational cost is O(mn) on matrix with size mÃ—n. Furthermore, based on this, we propose the lease squares online linear discriminant analysis (LS-OLDA) algorithm which can be used for online dimensionality reduction.

Keywords/Search Tags:

ensemble learning, multi-task learning, decision tree, na(i|¨)ve bayes, randomization, generalized inverse

PDF Full Text Request

Related items

1	Multi-level Selective Ensemble Learning Based On Multi-task Learning
2	Multi-task Learning With Matrix Generalized Inverse Gaussian Distribution
3	Study On The Frequency Of Auto Insurance Claims Based On Ensemble Learning Algorithm
4	The Axiomatic Fuzzy Sets Based Ensemble Learning Decision Tree
5	The Research On Classifier Ensemble Learning For Data Mining
6	Inverse Reinforcement Learning And Imitation Learning With Applications In Intelligent Robotics
7	Research On Ensemble Learning
8	Forestnet: A Learning Architecture Combining Deep Networks And Decision Forest
9	Study On Machine Learning And Its Applications In Multi-Agent Game Learning
10	The Study Of Ensemble Learning On Naive Bayes Classifier