Research On Random Forest Similarity Algorithm

Posted on:2019-11-06

Degree:Master

Type:Thesis

Country:China

Candidate:C Ma

Full Text:PDF

GTID:2428330566989338

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

In the field of machine learning,random forest are an important and common data mining method.Random forest not only has high classification performance,but also has the characteristics of fewer parameters to be adjusted,fast and efficient calculation,no worry about overfitting,and strong ability to tolerate noise.Random forest has been widely applied in various fields and achieved great success because of its good performance.It has attracted widespread attention.Although many scholars have conducted extensive research on random forest and have achieved many remarkable results,random forest still has some limitations and deficiencies,and has some room for improvement.Firstly,on the basis of the study of the existing calculation methods for the sample similarity of random forest,two improved calculation methods are proposed,which are the method of sample similarity calculation based on the characteristic importance and the method of sample similarity calculation based on the same attributes on the decision tree.The former is to associate the similarity between two samples that are located in the same leaf node with the position of the leaf node;the latter is to consider the case where the samples fall on different leaf nodes whose class labels are consistent,and associate the similarity between two samples with the number of identical attributes in the decision tree.Secondly,in view of the shortage of random forest in dealing with imbalanced data and the marginalization of SMOTE algorithm in selecting new negative samples,KMS_SMOTE algorithm is proposed.KMS_SMOTE algorithm firstly uses K-Means algorithm to classify the original negative samples into two categories and calculates their respective center points,then starting from the two central points,selects new negative samples,which makes the selected new negative samples converge to the center of the original negative class,and finally uses SMOTE algorithm on the new negative samples to get the new data set.This method effectively solves the defects of SMOTE algorithm,and improves the classification performance of random forest algorithm.Finally,using the data sets of UCI machine learning database,the improved calculation method of the similarity of random forest samples and KMS_SMOTE algorithm are carried out respectively,and the validity of the improved calculation method of the similarity of the samples and KMS_SMOTE algorithm are verified.

Keywords/Search Tags:

random forest, the similarity of the samples, imbalanced data, SMOTE, K-Means

PDF Full Text Request

Related items

1	Research On The Method Of Solving Imbalanced Classification Problems Based On Random Forest Algorithm
2	Research On Parallel Random Forest And Fuzzy C-Means Algorithm For Imbalanced Data
3	Research On The Expansion And Classification Of Several Imbalanced Data Sets Based On C-SMOTE Algorithm
4	Classification Learning Of Imbalanced Data Sets Based On Sampling Processing
5	The Research Of Web Pages Filtering Based On Random Forests Algorithms
6	Research For Imbalanced Big Data Classification Algorithm On Random Forest
7	Research And Application Of Classification Technology For Unbalanced Data
8	The Study On Random-SMOTE For The Classification Of Imbalanced Data Sets
9	Research On Imbalanced Data Classification Method Based On Random Forest Algorithm
10	Class-Imbalanced Data Stream Classification Method Based On Adaptive Random Forest