Towards Versatile Class-Imbalanced Learning:Algorithm,Application,and Software Library

Posted on:2023-03-26

Degree:Master

Type:Thesis

Country:China

Candidate:Z N Liu

Full Text:PDF

GTID:2568306758980209

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Many real-world applications reveal difficulties in learning classifiers from imbalanced data.The rising big data era has been witnessing more classification tasks with large-scale but extremely imbalance and low-quality datasets.Most of existing learning methods suffer from poor performance or low computation efficiency under such a scenario.To tackle this problem,we conduct deep investigations into the nature of class imbalance,which reveals that not only the disproportion between classes,but also other difficulties embedded in the nature of data,especially,noises and class overlapping,prevent us from learning effective classifiers.Taking those factors into consideration,this thesis proposes two novel ensemble-based imbalanced learning solutions,extensive experiments and analysis validate the effectiveness of the proposed solutions.Further,this thesis describes a modular,flexible,and easily extensible class-imbalance/long-tail machine learning library.It was designed to facilitate the standardization of research codes in this area,the reproducibility of results,and the application in real-world machine learning tasks.This thesis contains three main chapters.In the first chapter,we point out that in addition to the class-wise imbalance itself,the intra-class imbalanced distribution should also be taken into consideration when designing new imbalanced learning algorithms.In light of this,we present Duple-Balanced Ensemble,namely DUBE,a versatile ensemble learning framework.Unlike prevailing methods,DUBE directly performs inter-class and intra-class balancing without relying on heavy distance-based computation,which allows it to achieve competitive performance while being computationally efficient.In the second chapter,we find that typical IL methods including resampling and reweighting were designed based on some heuristic assumptions.They often suffer from unstable performance,poor applicability,and high computational cost in complex tasks where their assumptions do not hold.Therefore,we introduce a novel ensemble IL framework named Mesa.It adaptively resamples the training set in iterations to get multiple classifiers and forms a cascade ensemble model.Mesa directly learns the sampling strategy from data to optimize the final metric beyond following random heuristics.Moreover,unlike prevailing meta-learning-based IL solutions,we decouple the model-training and metatraining in Mesa by independently train the meta-sampler over task-agnostic meta-data.This makes Mesa generally applicable to most of the existing learning models and the meta-sampler can be efficiently applied to new tasks.Finally,the third chapter describes the imbalanced machine learning library IMBENS,which integrates 16 popular ensemble-based imbalanced learning methods and 19 resampling methods.This library is modular,flexible,and easily extensible by taking advantage of high-level abstraction,inheritance,and polymorphism,so as to promote the development and application of imbalanced learning techniques.

Keywords/Search Tags:

Class-imbalance, Imbalanced Data Mining, Ensemble Learning, Self-paced Learning, Meta-learning

PDF Full Text Request

Related items

1	Two-class Imbalanced Data Classification Based On Diverse Data Generation And Ensemble Learning
2	Research On Online Learning Algorithms For Drifting Imbalanced Data Stream
3	Hybrid Ensemble Learning For Imbalanced Data
4	Research On Imbalanced Data Classification Methods Based On Resampling And Ensemble Learning
5	Research On Extreme Learning Machine For Imbalanced Data Classification
6	Research On Imbalanced Classification And Its Coupling Relationship Based On Reinforcement Learning
7	A Hubness-aware Ensemble Learning Algorithm For High-dimensional Imbalanced Data Classification
8	Two-class Imbalanced Big Data Classification Based On Data Reduction And Ensemble Learning
9	Classification Algorithms For Class Imbalance Data
10	Research Of Ensemble Classification Methods For Class-imbalance And Cost-sensitive Datasets