Font Size: a A A

Several Research On Random Forest Improvement

Posted on:2014-06-27Degree:MasterType:Thesis
Country:ChinaCandidate:Z G LiFull Text:PDF
GTID:2268330428962252Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
In the field of machine learning, random forest is an important and common method of data mining. Random forest not only has the very high classification performance, but also has less parameters to adjust, fast and efficient operations, doesn’t have to worry about fitting and has strong ability to tolerate noise, etc. The good performance of the random forest makes it widely used in intelligent information processing, bioinformatics, finance, fault diagnosis, image recognition, industrial automation and other fields and has achieved remarkable success in these fields.Although there are remarkable achievements in the study of random forest, there are still some limitations and shortcomings. So random forest has some room for improvement.This paper first improves the sample similarity calculation for random forest and put forward a improved proximity matrix algorithm for random forest. Compared to traditional method, the improved method adds the leaf node path distance measurement, making the measure of sample similarity more precise. We use the improved proximity matrix algorithm in the research on classification and outliers detection. Through the contrast experiments on UCI data sets, the improved method can obtain better classification performance than the traditional method and proves the effectiveness of the improved method.This paper also analyses the relationship between classification margin and generalization ability and a new method named as weighted random forest pruning based on margin (MB-WRF) is proposed. At each pruning, the weight for each tree in random forest basing on its importance for classification margin is been calculated firstly, and delete the tree having least importance. Then recalculate the weight for every left tree based on its importance for classification margin and give every left tree different weights. We compare MB-WRF to RF. The contrast experiments results on gene expression and UCI datasets show MB-WRF can obtain better classification performance and less trees than RF.
Keywords/Search Tags:Random Forest, Proximity Matrix, Classification Margin, Weighting
PDF Full Text Request
Related items