Font Size: a A A

Analysis of robust measures for random forest regression

Posted on:2005-11-13Degree:Ph.DType:Thesis
University:University of VirginiaCandidate:Brence, John RFull Text:PDF
GTID:2458390008994744Subject:Engineering
Abstract/Summary:
Our approach is based on the RFR with two major differences---the introduction of robust prediction and error statistic. The current methodology utilizes the node mean for prediction and mean squared error (MSE) to derive the in-node and overall error. Herein, we introduce and assess the use of a median (and other robust measures) for prediction and mean absolute deviation (MAD) to derive the in-node and overall error. Extensive research has shown that the median is a better prediction of the centrality of the distribution in the presence of large or unbounded outliers because the median inherently ignores these outliers basing its prediction on the ordered, central value(s) of the data.; Our research hypothesis is that robust methods should significantly improve the predictive performance of random forest methods for nonparametric regression when the data contains unbounded outliers and displays the heteroscedastic property. We have shown that RRFR performs well under extreme conditions; with datasets that include unbounded outliers or heteroscedastic conditions. This hypothesis was tested using corrosion data and other datasets. Comparative performance among models was based on both the mean-squared-error (MSE) and mean-absolute-deviation (MAD) statistics.; The NDT data were derived from eddy current (EC) scans of the United States Air Force's (USAF) KC-135 aircraft. While we might suspect a link between NDT results and corrosion, up until now this link has not been formally established. Instead, the NDT data have been converted into false color images that are analyzed visually by maintenance operators. Previous models that we introduced suggest that by applying appropriate data mining techniques we can more effectively handle noisy data through more sophisticated models rather than simpler ones. Moreover, while a variety of modeling techniques can predict corrosion with reasonable accuracy, regression trees are particularly effective in modeling the complex relationships between the eddy current measurements and the actual amount of corrosion. (Abstract shortened by UMI.)...
Keywords/Search Tags:Robust, Current, Prediction, Error, Corrosion
Related items