Font Size: a A A

Research On Imbalanced Data Classification Algorithms Based On Weight Analysis Of Loss Function

Posted on:2022-03-24Degree:MasterType:Thesis
Country:ChinaCandidate:J L XieFull Text:PDF
GTID:2518306569980819Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Real-world data often presents an imbalanced distribution.When using traditional classification methods to train imbalanced distribution data,there will be a problem of poor recognition of minority class examples.This paper analyzes the characteristics and difficulties of imbalanced data classification,and proposes a new sampling method,a weighted analysing framework,and a new cost-sensitive loss function.The specific work of this paper is as follows:An imbalanced data classification method based on adaptive sampling(Adaptive Sampling Imbalanced data Classification,ASIC)is proposed.In view of the problem that traditional resampling methods mostly use fixed sampling strategies,this method can dynamically adjust the sampling probabilities of different types of samples on the training set according to the performance of the classification model on the verification set,so that the sampling probabilities of different categories are dynamically determined by the needs of the current classification model.For the categories with weaker recognition ability,the sampling probability is higher,and accordingly there are more training opportunities.At the same time,this method pays extra attention to the minority classes,and gives the minority classes a larger sampling probability under the same other conditions,so as to alleviate the impact of the lack of diversity within the minority class itself on the classification model,thereby improving the classification model 's ability to recognize minority classes.In addition,because this method uses the recall of the class on the validation set to guide the sampling of the examples on the training set,it can effectively alleviate the overfitting problem when upsampling the minority class,thereby improving the generalization performance of the classification model for the minority class.A weight analysing framework is proposed.Using this analysing framework,researchers can easily analyze the weighting strategies of different cost-sensitive loss functions.Moreover,under this analysing framework,different weights can be freely designed for the classifier weight vector and example features.A new type of cost-sensitive loss function(Composite Weighting Loss,CWL)is proposed.The loss function combines the improved LDAM Loss and improved Focal Loss at the same time,which can impose different margin constraints for different classes,and can increase the classification model's attention to difficult examples.At the same time,the curriculum learning strategy is introduced in the optimization of the loss function,so that the classification model can learn simple samples better in the early stage of training,and pay more attention to the learning of difficult samples as the training progresses,so that the learning process is smoother and the generalization ability is improved.The proposed ASIC sampling method and CWL cost-sensitive loss are compared with the commonly used imbalanced data classification algorithms on different imbalanced datasets.Because ASIC dynamically adjusts the sampling strategy based on the performance of the classification model,the classification model trained with the ASIC sampling method is better than other methods in terms of balanced accuracy and geometric mean,and the more imbalanced the data distribution,the superiority of the ASIC sampling method the more obvious.In addition,because CWL cost-sensitive loss uses curriculum learning strategies to adjust the degree of attention to hard examples,compared to other cost-sensitive loss functions,the classification model trained with CWL has generally better recognition capabilities for imbalanced distributed data.
Keywords/Search Tags:Imbalanced distribution data, Adaptive Sampling, Cost sensitive, Curriculum Learning
PDF Full Text Request
Related items