Research On Ensemble Learning Algorithm For Imbalanced Data

Posted on:2020-08-26

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Z M Zhang

Full Text:PDF

GTID:1368330578471856

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Imbalanced data sets are widespread in human production activities and daily life due to the data per se or influence of human factors in the sampling process.In these imbalanced data sets,the minority samples are often more closely related to some abnormal and important situations.However,in many practical applications,it is quite difficult to effectively classify and identify these minority samples with traditional methods.As an important research branch in data mining,ensemble learning has received wide attention from researchers.By integrating multiple sub-learners to study problems of machine learning,ensemble learning can significantly improve the generalization ability of the learning system,which has greater advantages than traditional single data mining algorithm.The main research object of this dissertation is the classification and clustering of imbalanced data and ensemble learning method is used as a tool.Some algorithms are proposed to improve the performance of imbalanced data set classification and clustering.At the data level,people mainly focus on how to reasonably and effectively adjust the sample distribution.At the algorithm level,people mainly focus on how to optimize and improve the parameters of existing algorithms.The main research contents of this dissertation are as follows:(1)K-AdaBoost clustering ensemble algorithm based on under-sampling techniqueA K-AdaBoost algorithm is proposed by combining the AdaBoost algorithm with the K-means technology to deal with the imbalanced data sets.The improved algorithm first uses under-sampling technique based on K-means clustering technology to reduce the amount of the majority samples and to achieve the balance of the imbalanced data set without destroying its structure.Secondly,the K-means algorithm is applied again on the newly obtained training samples set to obtain multiple clusters.By calculating the distance between test samples and cluster centers,weights of the base learners for the test samples are obtained.Finally,according to the weights of the base learners to combine the strong learner and as a result,the test samples are predicted.(2)R-AdaBoost classification ensemble algorithm based on ADASYNAn integrated classification R-AdaBoost algorithm based on ADASYN is proposed for the imbalanced data sets.First,the algorithm generates the m synthesis samples based on ADASYN technology,which can balance the original data set.Secondly,base data learners are used to classify the obtained data sets and get the classification results of each base classifier.In updating the weight value of the sample,the idea of the Focal Loss function is introduced to increase the weight of the difficult classification samples.Eventually,test samples are classified by the AdaBoost algorithm to obtain the final classification result.(3)EOS-Bagging ensemble learning algorithm based on evolutionary over-samplingThe EOS-Bagging(Evolutionary Over-sampling)algorithm is proposed for the imbalanced data set based on the improved SMOTE sampling technique.First,over-sampling is randomly performed on the minority samples.Secondly,based on the SMOTE algorithm and the genetic algorithm,selection operation,cross operation and mutation operation are conducted on the minority samples of the new data sets.Finally,at the algorithm level,by combining with the Bagging ensemble learning framework,base learners are used to classify the synthetic samples to obtain prediction results of the test samples.The experiments testify that the algorithms proposed in the dissertation have achieved some improvements in the performance of imbalanced data set classification and clustering.

Keywords/Search Tags:

Imbalanced data set, Ensemble learning, Classification, Clustering, Machine Learning

PDF Full Text Request

Related items

1	Research On Imbalanced Dataset Classification Based On Ensemble Learning
2	Hybrid Ensemble Learning For Imbalanced Data
3	Two-class Imbalanced Big Data Classification Based On Data Reduction And Ensemble Learning
4	Research And Application Of Imbalanced Data Classification Algorithm Based On Ensemble Learning
5	Research Of Imbalanced Data Classification Method Based On Oversampling And Ensemble Learning
6	Research On Imbalanced Data Classification Algorithms Based On Ensemble Learning
7	Research On Ensemble Learning
8	Research On Ensemble And Imbalanced Based Supervised/Unsupervised Learning Methods And Application
9	Research On Imbalanced Data Classification Methods Based On Ensemble Learning
10	Imbalanced Data Classification Algorithm Based On Unsupervised Intelligent Under Sampling Method