Creating fast and accurate machine learning ensembles through training dataset preprocessing

Posted on:2011-02-21

Degree:Ph.D

Type:Dissertation

University:Indiana University

Candidate:Whitehead, Matthew E. N

Full Text:PDF

GTID:1448390002966910

Subject:Artificial Intelligence

Abstract/Summary:

Machine learning algorithms make it possible to process large amounts of information faster and more accurately than ever before. Classification and regression algorithms build high-level mathematical models which can be used to approximate functions that map complex, high-dimensional input features to certain output classes or real-valued states.;Single machine learning models can be accurate and effective, but combining independent component machine learning models into groups, called ensembles, has been shown to increase overall classification accuracy for many problems. Ensemble techniques increase classification accuracy with the trade-off of increasing computation time, especially classifier training time.;In this dissertation, we will investigate the creation of highly efficient machine learning ensembles that have fewer component models than existing ensemble algorithms and that can be trained in a much shorter period of time. We use several forms of training dataset preprocessing in order to prepare the data to be used to create accurate ensembles. In particular, we use data clustering and dimensionality reduction using singular value decomposition to create small ensembles that retain high accuracies, but require fewer components than other ensemble methods. We also show how these algorithms can be used to work with a variety of data mining datasets to achieve a high classification accuracy that would be acceptable for use in practical applications.

Keywords/Search Tags:

Machine learning, Accurate, Data, Classification, Ensembles, Training, Algorithms

Related items

1	Research On Text Classification Algorithms Based On Machine Learning
2	Application And Research Of Data Classification Based On Machine Learning Algorithms
3	Data complexity in machine learning and novel classification algorithms
4	Training The Classification Algorithm Based On Ep
5	Research And Implementation Of Classification Model On Big Data In Healthcare Based On Semi-supervised Learning Algorithm
6	Research On Classification Methods Based On Extreme Learning Machine
7	Research On Meta-heuristic Optimized Extreme Learning Machine Based Classification Algorithms And Application
8	Research On Membership Inference Attacks And Protections Of Training Data In Machine Learning
9	Research And Implementation Of Image Recognition Model Online Fast Training S Ystem For Small Scale Data
10	Machine learning approaches for dealing with limited bilingual training data in statistical machine translation