Research On Decision Tree Classification Method Of Imbalanced Data Based On Reinforcement Learning

Posted on:2019-07-11

Degree:Master

Type:Thesis

Country:China

Candidate:Z Niu

Full Text:PDF

GTID:2348330569479967

Subject:Electronics and Communications Engineering

Abstract/Summary:

PDF Full Text Request

In recent years,with popularization of the internet and improvement of information,industries are creating more and more data.Rapid classification and recognition is the key to improving the speed of intelligence information processing effectively in various industries and speed up development of related industries.Although the amount of data is increasing,some types of data still occupy a very little part,that is,data sets with this part is imbalanced.usually these minority class data are the focus of research.At present,existing classifiers are not so good at identifying the minority class samples while the data is imbalanced.Based on the analysis of imbalanced data distribution,this paper presents an improved under-sampling algorithm which based on redundancy-removed to pretreatment the imbalanced data sets.Studying the decision tree classification algorithm and reinforcement learning this paper proposes a new ensemble forest algorithm classification model.The main work of this thesis is as follows:Firstly,this paper propose an improved under-sampling method based on clustering fusion and redundancy-removed,applied it to the imbalanced data pretreatment before classification prediction and compared with the existing under-sampling methods.By analyzing defects of existing under-sampling algorithms and based on distribution of imbalanced data sets,this paper propose the concept of similarity redundancy coefficient,and the data set is under-sampled through this coefficient.Results show that this method can improve the class positive rates and the G-mean value significantly on thepremise which the classification accuracy is basically unchanged.Secondly,a decision tree optimization model based on reinforcement learning cumulative return attribute selection method is proposed.By analyzing the formation principle of intensive learning,combining the growth mode of the decision tree,a cumulative returning learning attribute selection method is proposed.The cumulative returning learning factor is integrated into the attribute selection process of decision tree split node,and the classification of the decision tree to minority samples is strengthened.By comparing the cumulative returning learning method,the cost-sensitive learning method which based on decision tree and the original decision tree classification model,this experiment proves effectiveness of this method.Thirdly,Based on random forest algorithm,an improved integrated forest algorithm which based on the same distribution random sampling is presented.Analyzing and studying the principle of random forest algorithm,combined distribution characteristics of imbalanced data sets,this paper proposed a new method used on the same distribution sampling.The sample subset obtained through this sampling method,not only maintain the distribution of the original data set,but also reduce the imbalanced rate of the sample subset.The ensemble forest algorithm is formed by cumulative returning learning method and the same distribution sampling method.Finally,the effectiveness of the proposed ensemble forest algorithm is verified by experiments.

Keywords/Search Tags:

Imbalanced data set, clustering, redundancy-removed under-sampling, cumulative returning, ensemble forest

PDF Full Text Request

Related items

1	Imbalanced Classification Algorithm Based On Clustering Ensemble Under-Sampling
2	Research On Imbalanced Data Classification Based On Sampling Method And Ensemble Learning
3	Research And Application Of Imbalanced Data Processing Algorithm
4	An Adaptive Sampling Ensemble Classifier For Learning From Imbalanced Data Sets
5	Research On Ensemble Approach For Classification Of Imbalanced Data Sets
6	Comprehensive Oversampling And Undersampling Study Of Imbalanced Data Sets
7	Two-class Imbalanced Big Data Classification Based On Data Reduction And Ensemble Learning
8	Research On Imbalanced Data Classification Algorithms Based On Ensemble Learning
9	The Research Of Imbalanced Data Based On Oversampling Technique
10	Research And Application Of Imbalanced Data Processing Algorithm