Font Size: a A A

Hybrid Ensemble Learning For Imbalanced Data

Posted on:2021-03-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:K X YangFull Text:PDF
GTID:1488306464482584Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the information age,with the rapid development of massive data acquisition and storage technology,how to obtain valuable information from data has become a hot issue in many industries.As an important branch in the field of artificial intelligence,machine learning technology integrates multi-disciplinary knowledge and uses various methods to model and analyze data to realize knowledge discovery.Machine learning technology has gradually become a new driving force for the development of many industries,and is widely used in scientific research and industrial production.However,in many practical applications,the data exhibits an unequal distribution of samples across its classes,which is regarded as class imbalance problem.The standard machine learning algorithms are designed based on the assumption that the distribution of training data is balanced,which has led to the poor performance of traditional machine learning algorithms when dealing with imbalance problem.The class imbalance problem has also become increasingly challenging in many practical classification applications.In machine learning,ensemble learning methods have become a hot research direction due to their good generalization performance.With the help of ensemble learning methods,many single classification models can be further improved in performance.This paper focuses on the research of imbalanced data classification methods combined with ensemble learning.On the basis of in-depth exploration of various strategies for imbalance learning,this paper focuses on more reasonable and effective resampling methods designed at the data level,new model improvements at the algorithm level,cost-sensitive matrix design at the algorithm level,ensemble strategies that integrate imbalance problems,and the ensemble framework of fusion optimization methods.The main work of this paper is described as follows:(1)In order to solve the issue that the under-sampling method is easy to lose effective information and the cost-sensitive method is too sensitive to outliers and noise points,this paper proposes a multi-objective hybrid optimization ensemble algorithm.First,a distribution based resampling strategy is designed to reduce the risk of losing a large amount of information during the under-sampling process.At the same time,a density-based under-sampling multi-objective optimization ensemble method(DBUME)is designed based on the sample distribution information.Finally,a hybrid ensemble framework is proposed,which integrates and optimizes the prediction results of the cost-sensitive method and DBUME by ADMM algorithm.The purpose is to make up for the respective limitations of the resampling method and the cost-sensitive method to a certain extent.This paper conducts experiments on a large number of imbalanced data sets,and compares the proposed algorithm with the existing mainstream imbalance learning methods.The experimental results comprehensively and systematically prove the effectiveness of our method.(2)In order to solve the issue that the class overlapping and insufficient adaptability of resampling process negatively influence the performance of imbalance learning algorithms,this paper proposes a hybrid ensemble framework that combines metric learning and adaptive twostage under-sampling methods.Metric learning is used to find a more suitable embedding space for the original imbalanced data set,and the adaptive two-stage under-sampling method considers both the informative and representative samples to generate a balanced data set.In addition,in order to improve the generalization ability,this paper proposes a progressive ensemble framework(PHCE),which uses a progressive mechanism with local and global evaluation criteria to select ensemble members,thereby further improve the performance of the model.Extensive comparative experiments conducted on multiple real-world data sets show that PHCE is superior to most imbalance ensemble classification approaches,and has better performance in dealing with imbalanced data classification problems.(3)In order to solve the limitation of the broad learning system to deal with the imbalanced data classification problem,this paper designs a weighted broad learning system(WBLS).At the same time,in order to reduce the influence of abnormal points and noise points in the imbalanced data,we designed a weight generation strategy based on hybrid density by combining with the prior distribution information of the data,and proposed an adaptive weighted broad learning system(AWBLS).Finally,an incremental ensemble framework is proposed to further improve the stability and robustness of AWBLS through an incremental ensemble mechanism.Experiments on a large number of real-world data sets prove the superior performance of the proposed algorithm.
Keywords/Search Tags:Machine learning, classification algorithm, imbalance learning, ensemble learning, re-sampling, cost-sensitive learning, broad learning system
PDF Full Text Request
Related items