Font Size: a A A

The Research On Optimization Of Random Forest Algorithm Based On Rough Set

Posted on:2020-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:B XingFull Text:PDF
GTID:2428330578458180Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the era of big data,single classifier technology can no longer meet the increasingly complex and massive data needs;therefore,multiple classifier becomes more important and effective.Multiple classifier combination uses multiple base classifiers to classify,and synthesizes all classification results to form a final result.Random forest is a Multiple classifier.One of the random characteristics of Random Forest algorithm is to select a certain number of features randomly from the overall characteristics,so as to reduce the correlation between trees as much as possible,but there are often redundant features in the data.Because of the randomness,the generalization ability of the model will be affected.Aiming at the redundancy of data set in Random Forest feature selection,through the analysis and investigation of traditional Random Forest algorithm,it is decided to use rough set to optimize and improve the traditional Random Forest algorithm.Rough set can simplify the data and get the minimum expression of knowledge while retaining the key information.Rough set can effectively deal with the problem that there are more redundant features in the data set when selecting Random Forest features,which affect the classification effect of the model.Based on this,this thesis chooses rough set to optimize the Random Forest algorithm,optimizing the overall characteristics before selecting the characteristics of the Random Forest,and uses genetic algorithm-based rough set attribute reduction method to reduce the attributes,eliminating redundant attributes in the overall characteristics,so as to improve the efficiency of the Random Forest algorithm.Here,the thesis has done the following work:(1)The status of attribute reduction research,the research status of rough set attribute reduction and the research status of random forest at home and abroad are introduced.The basic theory of rough set is introduced in detail.The basic mathematical concepts and properties of random forest algorithm are studied in detail;the decision tree algorithm is studied in detail,the generation of decision tree and the algorithms of ID3,C4.5 and CART are introduced;on the basis of building decision tree,the construction process of random forest algorithm is studied,the generation of random forest data sets,the construction of single decision tree and theimplementation of random forest algorithm are also studied.The process is analyzed in detail.(2)Aiming at the problem of redundancy in feature selection of random forest,this thesis combines the attribute reduction method of rough set based on genetic algorithm with the idea of random forest classification,proposes a classification prediction algorithm based on the combination of rough set and random forest based on genetic algorithm,and reduces the attribute reduction method of rough set based on Genetic algorithm on multiple UCI data sets.Comparing with PCA and CHI2,the average accuracy is selected as objective evaluation parameter to evaluate the effect of three different reduction methods.(3)The classification prediction algorithm based on rough set and random forest of genetic algorithm is realized by programming.Comparing with the classical random forest algorithm,the efficiency of the algorithm is tested on the wine data set and cervical cancer data set.The classification accuracy,running time,ROC curve,AUC mean,OOB and oob_error are selected as the evaluation indexes to evaluate it comprehensively.Compared with many machine learning algorithms on several machine learning data sets,the average accuracy is selected as the evaluation index to verify the effectiveness of the optimized stochastic forest algorithm in classification.In this thesis,based on the research of rough set and random forest theory,firstly,the attribute reduction method of rough set based on genetic algorithm is used to optimize the random forest feature selection,which greatly improves the classification effect of random forest.Therefore,the combination of attribute reduction of rough set based on genetic algorithm and random forest classification not only has certain method innovation,but also has important value in practical application.
Keywords/Search Tags:Rough Set, Decision tree, Random Forest, Genetic Algorithm, Attribute Reduction
PDF Full Text Request
Related items