Font Size: a A A

Research On Transfer-sampling Based Method For Class-imbalance Learning

Posted on:2018-06-10Degree:MasterType:Thesis
Country:ChinaCandidate:S T WangFull Text:PDF
GTID:2348330542453039Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The class imbalance problem is one of the major challenges in many real world applications,where some classes are much smaller than other classes and the smaller classes are more important.In the class imbalance learning,commonly used performance evaluation criteria are AUC,F-measure or G-mean,rather than accuracy.Random oversampling is a simple and effective class imbalance learning algorithm,but it usually has the risk of overfitting.In order to reduce the risk of overfitting,the SMOTE algorithm oversamples the minority classes by generating synthetic samples.But it may introduce noise and increase the "overlapping" problem between classes.In other words,synthetic samples and real samples are not independently and identically distributed.In order to generate synthetic samples that are more consistent with the ground-truth data distribution,a series of improved algorithms have been proposed that use the neighbor information of the minority class samples to guide the sampling process or to estimate the ground-truth data distribution to generate minority class samples.However,despite of all kinds of strategies,it cannot be guaranteed that the synthetic samples completely obey the ground-truth data distribution when the minority class samples are absolutely rare.Therefore,it is necessary to recognize that the synthetic samples do not obey the ground-truth data distribution.And this leads to a problem worth studying:how to effectively utilize these synthetic minority class samples to tackle class imbalance problem.This thesis assumes that though the synthetic minority class samples are not consistent with the ground-truth data distribution,they are highly correlated.Therefore,the idea of transfer learning is exploited to utilize the minority class samples which are highly related to the ground-truth data distribution to tackle class imbalance problem.Our work include the following:1)A Boosting-based class-imbalance learning algorithm TrasoBoost is proposed.In each iteration,the algorithm decreases the weights of the misclassified synthetic minority class sample,increases the weights of the misclassified original samples,and keeps the weights of correctly classified samples unchanged.Thus,after several iterations,the weights of the non-i.i.d.synthetic samples will gradually decrease,thereby reducing its impact in the learning process.Experimental resutls show that TrasoBoost is superior to a variety of popular class imbalance learning algorithms.2)A large margin based transfer learning algorithm TrSVMs is proposed.Unlike AUX-SVMs algorithm,TrSVMs learns separate hyperplanes for source and target domains to meet the challenge of the large divergence between source and target domain distributions.Experimental results show that TrSVMs is superior to AUX-SVMs.We will tackle class imbalance problem based on TrSVMs in future work.
Keywords/Search Tags:class imbalance, oversampling, SMOTE, transfer learning, Boosting
PDF Full Text Request
Related items