Improvements to random forest methodology

Posted on:2014-10-10

Degree:Ph.D

Type:Dissertation

University:Iowa State University

Candidate:Xu, Ruo

Full Text:PDF

GTID:1458390005996022

Subject:Statistics

Abstract/Summary:

Random forest (RF) is a widely used machine learning method that shows competitive prediction performance in various fields, including biological science, finance, chemical engineering, agroscience, medical analysis, etc. In this dissertation, we study some characteristics and modifications of RFs in order to improve its prediction performance.;In CHAPTER 1, we review the mechanics of classification and regression trees (CARTs), bootstrap aggregation (bagging) and RFs. The properties of RFs are discussed, along with several variations of this method.;In CHAPTER 2, we describe a counter-intuitive discovery using RFs: the out-of-sample prediction errors can be reduced by augmenting the regressor with a new scientifically meaningless predictor variable independent of all variables in the dataset. We explain this phenomenon using a simulated example and discuss the importance of this result in interpreting predictor variable importance in RFs.;RF predictions can be biased. In CHAPTER 3, we apply an iterative debiasing approach based on bagging to RFs and test this bias correction method with real datasets. The debiasing approach can significantly improve RF predictions. The number of debiasing iterations can be tuned using cross-validation.;Standard RF methodology generates a common RF from a given training sample, regardless of test cases. In CHAPTER 4, we propose a new way to grow a RF specifically predicting a particular test case, namely, Case-Specific Random Forests (CSRF). We also suggest Case-Specific Variable Importance (CSVI), a new definition of predictor variable importance in terms of the prediction performance on a particular test case.;Prediction error estimation is generally useful in evaluation of a prediction rule. All present methods deal with estimating prediction errors averaging over the distribution of a test set. In CHAPTER 5, we propose a method to estimate expected prediction loss on a specific regressor point using RF methodology.

Keywords/Search Tags:

Method, Prediction, CHAPTER, Using

Related items

1	Chaotic Time Series Prediction Method And Its Application
2	Neural network models for prediction, estimation, and optimization: Algorithms and applications
3	On The Grey Prediction Method And Its Application In Watercraft Motion Modeling And Prediction
4	The Applied Research Of Multivariate Chaotic Time Series Analysis On Network Traffic Prediction
5	Research On Prediction Method Of Anomaly Job In Cluster Environment
6	Based On The Offline Time Series Data Of Sudden Failure Prediction
7	Integrated methods for rapid genetic analysis using microfluidic devices
8	Research On Temporal-aware QoS Prediction Method Based On Non-negative Latent Factorization Of Tensors
9	Research And Application Of Combinatorial Method For The Financial Risk Prediction
10	Application Researches On Interval Prediction Method Based On Bootstrap Method And Relevance Vector Machine