Font Size: a A A

Creating fast and accurate machine learning ensembles through training dataset preprocessing

Posted on:2011-02-21Degree:Ph.DType:Dissertation
University:Indiana UniversityCandidate:Whitehead, Matthew E. NFull Text:PDF
GTID:1448390002966910Subject:Artificial Intelligence
Abstract/Summary:
Machine learning algorithms make it possible to process large amounts of information faster and more accurately than ever before. Classification and regression algorithms build high-level mathematical models which can be used to approximate functions that map complex, high-dimensional input features to certain output classes or real-valued states.;Single machine learning models can be accurate and effective, but combining independent component machine learning models into groups, called ensembles, has been shown to increase overall classification accuracy for many problems. Ensemble techniques increase classification accuracy with the trade-off of increasing computation time, especially classifier training time.;In this dissertation, we will investigate the creation of highly efficient machine learning ensembles that have fewer component models than existing ensemble algorithms and that can be trained in a much shorter period of time. We use several forms of training dataset preprocessing in order to prepare the data to be used to create accurate ensembles. In particular, we use data clustering and dimensionality reduction using singular value decomposition to create small ensembles that retain high accuracies, but require fewer components than other ensemble methods. We also show how these algorithms can be used to work with a variety of data mining datasets to achieve a high classification accuracy that would be acceptable for use in practical applications.
Keywords/Search Tags:Machine learning, Accurate, Data, Classification, Ensembles, Training, Algorithms
Related items