| Many real-world applications reveal difficulties in learning classifiers from imbalanced data.The rising big data era has been witnessing more classification tasks with large-scale but extremely imbalance and low-quality datasets.Most of existing learning methods suffer from poor performance or low computation efficiency under such a scenario.To tackle this problem,we conduct deep investigations into the nature of class imbalance,which reveals that not only the disproportion between classes,but also other difficulties embedded in the nature of data,especially,noises and class overlapping,prevent us from learning effective classifiers.Taking those factors into consideration,this thesis proposes two novel ensemble-based imbalanced learning solutions,extensive experiments and analysis validate the effectiveness of the proposed solutions.Further,this thesis describes a modular,flexible,and easily extensible class-imbalance/long-tail machine learning library.It was designed to facilitate the standardization of research codes in this area,the reproducibility of results,and the application in real-world machine learning tasks.This thesis contains three main chapters.In the first chapter,we point out that in addition to the class-wise imbalance itself,the intra-class imbalanced distribution should also be taken into consideration when designing new imbalanced learning algorithms.In light of this,we present Duple-Balanced Ensemble,namely DUBE,a versatile ensemble learning framework.Unlike prevailing methods,DUBE directly performs inter-class and intra-class balancing without relying on heavy distance-based computation,which allows it to achieve competitive performance while being computationally efficient.In the second chapter,we find that typical IL methods including resampling and reweighting were designed based on some heuristic assumptions.They often suffer from unstable performance,poor applicability,and high computational cost in complex tasks where their assumptions do not hold.Therefore,we introduce a novel ensemble IL framework named Mesa.It adaptively resamples the training set in iterations to get multiple classifiers and forms a cascade ensemble model.Mesa directly learns the sampling strategy from data to optimize the final metric beyond following random heuristics.Moreover,unlike prevailing meta-learning-based IL solutions,we decouple the model-training and metatraining in Mesa by independently train the meta-sampler over task-agnostic meta-data.This makes Mesa generally applicable to most of the existing learning models and the meta-sampler can be efficiently applied to new tasks.Finally,the third chapter describes the imbalanced machine learning library IMBENS,which integrates 16 popular ensemble-based imbalanced learning methods and 19 resampling methods.This library is modular,flexible,and easily extensible by taking advantage of high-level abstraction,inheritance,and polymorphism,so as to promote the development and application of imbalanced learning techniques. |