Font Size: a A A

Research On Imbalanced Classification Method Based On XGBoost

Posted on:2019-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:Z C WanFull Text:PDF
GTID:2348330542497646Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
At present,the research on the classification of imbalanced data is mainly divided into data level,algorithm level and evaluation level.Aiming at the problem that the traditional ensemble learning algorithm is prone to over-fitting when dealing with the imbalanced data and that the classification effect of the traditional ensemble learning algorithm is not ideal,this thesis studies the imbalanced data classification problem and feature selection algorithm of data set by fusing feature selection and parameter optimization correlation algorithm based on XGBoost ensemble learning algorithm.The main research works of this thesis can be summarized as follows:(1)In this thesis,a relief algorithm for imbalanced data classification is proposed,solves the problem that the traditional relief algorithm may have too large pseudo-weight in random sampling.Moreover,it is possible to select the features that are more favorable to the classification of the minority class.(2)On the basis of the improved relief feature selection algorithm,a new imbalanced classification method based on Relief feature selection and GP parameter optimization is proposed in this thesis.The method uses the improved Relief algorithm to select the features that are more helpful to the classification of several categories,then uses the XGBoost algorithm to classify,and uses the Gaussian Bayesian optimization algorithm to find the XGBoost optimal hyperparameter combination.By using 8 groups of UCI data set to test the proposed algorithm,the experimental results show that the method can effectively improve the classification performance of unbalanced data.(3)The improved relief algorithm has the problem in artificially setting the feature weight threshold,while weight threshold is too small or too large,which may eliminate the relatively important features or retain redundant features.This thesis presents a new rough set model based on rough set theory.In order to obtain a better approximate approximation effect in multi-granulation rough set model for the target conception,firstly the intuitionistic fuzzy rough set and multi-granulation rough set will be combined and the model of intuitionistic fuzzy multi-granulation rough set is proposed.Loose deficiencies exist in the target approximation of the model,and then a variable intuitionistic fuzzy multi-granulation rough set model is proposed through introducing the parameter method,which makes the proposed model improved,and the model validity is proved.Finally,on the basis of this model,the corresponding approximate distribution reduction algorithm is presented.In the simulation results,the proposed results of the lower approximation distribution reduction have more than 2 to 4 attributes to the existed fuzzy multigranulation decision-theoreticrough set and multigranulation double-quantitative decision-theoretic rough set,and the proposed upper approximate distribution reduction algorithm has less than 1 to 5 attributes to the two existed algorithms,meanwhile,the approximation accuracy of reduction results has more reasonable and superior performance.Thus,theories and experiments are verified the proposed variable intuitionistic fuzzy multi-granulation rough set model has higher superiority in terms of approximating approximation and reducing dimensions.(4)Based on the approximate distribution reduction algorithm of a variable intuitionistic fuzzy multi-granulation rough set model,a new unbalanced classification method based on attribute reduction of rough set and GP parameter optimization is proposed in this thesis.The experimental results show that compared with the traditional ensemble learning algorithm and the algorithm proposed in the third chapter,the classification effect of unbalanced data is more effective.Both F-Measure value and AUC value evaluation index of the algorithm have achieved a good result of unbalanced data classification.
Keywords/Search Tags:Feature Selection, Ensemble Learning, Imbalanced Data Classification, Intuitionistic Fuzzy Rough Set, Variable Multi-granulation Rough Set, Approximate Distribution Reduction
PDF Full Text Request
Related items