Font Size: a A A

Optimizing Voting Process Of Random Forests Algorithm Based On Weighted Decision Trees

Posted on:2018-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:X D MaFull Text:PDF
GTID:2348330518983424Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Random forests are a combined classifier,its main idea is to construct a multitude of decision trees based on two random processes(randomly sampling from training dataset and randomly extracting features).And the final predicted result is decided by the voting of all decision trees,which avoid over-fitting effectively.Moreover,the relative independence among each decision tree is suitable for parallel computing to improve the prediction efficiency of RF model and dealing with high dimensional datasets.All these Characteristics make RF been widely used in many engineering applications,and become a popular algorithm in machine learning and data mining.Although the random sampling process solves over-fitting of RF model,differences exist in prediction ability among decision trees.In traditional RF,decision tree with different prediction ability have the same voting weight,which affected the stability of RF.For these reasons,this paper puts forward an optimized voting method to further improve the prediction ability of RF model.We distribute decision tree classifier's voting weight by its prediction ability and statistical characteristics of datasets.The accuracy and efficiency of the RF model are improved by weighted decision tree voting process.In this paper,we studied the traditional RF algorithm first,mainly optimizing the voting process of RF.We propose some optimizing methods aiming at the problems existing in RF's voting process.The rationality and superiority of our optimizing methods are verified by experiments on several public datasets.The main work of this paper includes:(1)We propose 4 methods to calculate decision tree classifiers' voting weights based on decision tree classifiers' prediction ability and statistical characteristics of datasets,including OOB evaluating,correlation coefficient of datasets,Chi-square and mutual information.We conduct experiments on 8 different datasets,the experimental results show that weighted voting process can improve prediction ability of RF.(2)This paper presents a method named half-voting on weighted RF model.We first sorted all decision trees in descending order by their voting weights.And set the half-voting stop condition in RF's prediction process.This method is able to improve RF's prediction speed by triggering stop condition.We conduct experiments on 4 different datasets,the experimental results show that half-voting method can obviously improve the RF's prediction speed without affecting its accuracy.
Keywords/Search Tags:Random Forests, Voting Weight, Half-Voting
PDF Full Text Request
Related items