Research On Transfer-sampling Based Method For Class-imbalance Learning

Posted on:2018-06-10

Degree:Master

Type:Thesis

Country:China

Candidate:S T Wang

Full Text:PDF

GTID:2348330542453039

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The class imbalance problem is one of the major challenges in many real world applications,where some classes are much smaller than other classes and the smaller classes are more important.In the class imbalance learning,commonly used performance evaluation criteria are AUC,F-measure or G-mean,rather than accuracy.Random oversampling is a simple and effective class imbalance learning algorithm,but it usually has the risk of overfitting.In order to reduce the risk of overfitting,the SMOTE algorithm oversamples the minority classes by generating synthetic samples.But it may introduce noise and increase the "overlapping" problem between classes.In other words,synthetic samples and real samples are not independently and identically distributed.In order to generate synthetic samples that are more consistent with the ground-truth data distribution,a series of improved algorithms have been proposed that use the neighbor information of the minority class samples to guide the sampling process or to estimate the ground-truth data distribution to generate minority class samples.However,despite of all kinds of strategies,it cannot be guaranteed that the synthetic samples completely obey the ground-truth data distribution when the minority class samples are absolutely rare.Therefore,it is necessary to recognize that the synthetic samples do not obey the ground-truth data distribution.And this leads to a problem worth studying:how to effectively utilize these synthetic minority class samples to tackle class imbalance problem.This thesis assumes that though the synthetic minority class samples are not consistent with the ground-truth data distribution,they are highly correlated.Therefore,the idea of transfer learning is exploited to utilize the minority class samples which are highly related to the ground-truth data distribution to tackle class imbalance problem.Our work include the following:1)A Boosting-based class-imbalance learning algorithm TrasoBoost is proposed.In each iteration,the algorithm decreases the weights of the misclassified synthetic minority class sample,increases the weights of the misclassified original samples,and keeps the weights of correctly classified samples unchanged.Thus,after several iterations,the weights of the non-i.i.d.synthetic samples will gradually decrease,thereby reducing its impact in the learning process.Experimental resutls show that TrasoBoost is superior to a variety of popular class imbalance learning algorithms.2)A large margin based transfer learning algorithm TrSVMs is proposed.Unlike AUX-SVMs algorithm,TrSVMs learns separate hyperplanes for source and target domains to meet the challenge of the large divergence between source and target domain distributions.Experimental results show that TrSVMs is superior to AUX-SVMs.We will tackle class imbalance problem based on TrSVMs in future work.

Keywords/Search Tags:

class imbalance, oversampling, SMOTE, transfer learning, Boosting

PDF Full Text Request

Related items

1	Research On The Application Of Generative Adversarial Networks In Class Imbalance
2	Improved Grouped SMOTE With Noise Filtering Mechanism
3	Research On Imbalanced Datasets Classification Based On Machine Learning And Oversampling Methods
4	Research On Partial Label Learning With Class Imbalance And Unlabeled Data
5	Identification Of Encrypted Traffic As Small Sample Of Class-imbalance
6	Two-class Imbalanced Data Classification Based On Diverse Data Generation And Ensemble Learning
7	Effect Of Class Imbalance On Transfer Learning
8	Research On A Class Of Natural Image Classification Algorithm Via Transfer Learning
9	Research On Imbalanced Classification Problems In The Framework Of Transfer Learning
10	A Comparative Study Of Oversampling Techniques Based On Unbalanced Credit Data Sets