Research On Random Forest Algorithm Based On Feature Selection And Diversity

Posted on:2021-03-20

Degree:Master

Type:Thesis

Country:China

Candidate:S F Luo

Full Text:PDF

GTID:2428330614958324

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

With the advent of the Internet of everything,the development of all walks of life is inseparable from the Internet,and various fields related to the Internet are flooded with a large amount of complex data information,and the problem of "information overload" has occurred,causing users or some enterprise platforms.It is difficult to mine the key information needed from massive information data.For large data processing tasks,the integrated classification model based on machine learning training can effectively solve such problems,but the model has certain limitations due to its own difficulties in data feature fitting and generalization errors of the integrated model.Therefore,based on the random forest integration algorithm,this paper studies the integration model by improving the base classifier and integration.The main work and improvements of this paper are as follows:1.In the data preprocessing stage of the classification model,in view of the difficulties in the division of feature attributes and the difficulty of fitting the data set in the process of data processing and feature selection,this design takes features as an important basis and comprehensively analyzes data,features,and categories.The correlation between the two,through the feature importance measure and p-value test to filter out a high-efficiency feature subset,and finally use the random forest model to study the classification accuracy.Through simulation analysis of experimental data,it is concluded that a high-efficiency feature subset can effectively solve the feature attribute division problem,thereby improving the accuracy and recall rate of the random forest integrated model.2.In the large-scale data classification stage,in order to solve the problem of generalization error caused by the redundancy of the base classifier and the insignificant diversity within the random forest integration algorithm,this paper designs a limit random forest integration algorithm that combines feature information and diversity.The algorithm first uses the efficient feature subsets screened by the p-value test and uses the random tree as the base classification model to introduce more randomness.Secondly,it performs structural redundancy analysis on the random tree to avoid duplication of features at the nodes;Finally,a weighted majority vote is used to construct a framework of high classification accuracy and diversified extreme random forest models.Through the simulation verification of different experimental data sets,the experimental results show that the proposed algorithm can effectively solve the problem of generalization error of the model,and improve the fault tolerance and data fitting ability of the integrated algorithm.

Keywords/Search Tags:

integrated classification model, random forest, feature selection, p-value validation, extreme random forest

PDF Full Text Request

Related items

1	Prediction Of Road Traffic Concentration Using Random Forest Algorithm Based On Feature Compatibility
2	Research On Feature Selection And Classification Method Based On Random Forest For Medical Datasets
3	Research On Imbalanced Data Classification Method Based On Random Forest Algorithm
4	Research On Optimization Of Random Forest Algorithm And Its Application In Text Parallel Classification
5	Research On ELM Image Classification Combining HOG And Random Forest
6	Research On Feature Selection Method Based On Random Forest
7	Research And Application In Text Classification Based On Random Forest
8	Feature Selection Based On Random Forest And Classification Complementariness
9	Research On Adaptive Feature Selection And Parameter Optimization Algorithm For Random Forest
10	Random Forest Feature Selection