Research On Active Learning Method For Imbalanced Data Distribution

Posted on:2024-09-13

Degree:Master

Type:Thesis

Country:China

Candidate:D H Yang

Full Text:PDF

GTID:2568306941997519

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

In the current research,active learning algorithms have shown remarkable performance on balanced data classification tasks,but face challenges in the scenario of imbalanced data distribution.Balanced data distribution is only an ideal situation,however in reality,most data is imbalanced.This makes research on imbalanced data distribution crucial.Due to the nature of imbalanced data,categories with fewer examples have fewer features,making it difficult to extract information from them.Additionally,traditional active learning methods are easily influenced by redundant majority class samples and cannot accurately select effective samples to improve model performance,making it difficult to achieve excellent results in datasets with balanced class distributions.We focus on research on active learning methods for imbalanced data distribution,and propose an active sampling strategy for imbalanced data distribution.This algorithm consists of two parts,effectively addressing the problems encountered by active learning algorithms in imbalanced datasets,and improving the model’s performance.Firstly,we discussed the process of calculating the loss function.To address the problem of imbalanced data distribution,we divide the training set into categories and propose a new loss function by reshaping the traditional loss function based on data partitioning results.The algorithm’s core idea is to fully utilize the information of labeled and unlabeled examples,reduce the risk of misclassifying an example as a minority class by setting thresholds and using the predicted probability of examples,and increase the risk of misclassifying an example as a majority class.This effectively reduces the interference of redundant examples on the model and improves the accuracy of the model.This approach deals with imbalanced data classification problems from an algorithmic perspective.Secondly,we discussed the adaptive active sampling strategy.This paper proposes an adaptive active sampling strategy based on the loss function.This strategy adaptively switches sampling strategies based on the imbalance ratio of labeled positive and negative examples.Suitable unlabeled examples are selected for labeling and updating the training set to ensure a balanced number of labeled positive and negative samples during the training process.This strategy can effectively reduce the frequent queries of the model to the majority class samples,avoid ineffective sampling in the active learning process,and make the model’s training process more efficient and accurate.This approach addresses imbalanced data classification problems from a data perspective.In this paper,we verified the effectiveness of the proposed method on binary labelimbalanced datasets,and compared with other active learning methods.The results show that the proposed method significantly improves the classification performance of imbalanced data classification tasks and outperforms others in terms of classification accuracy and other metrics.

Keywords/Search Tags:

Active learning, Imbalance distribution, Risk minimization, Convolutional neural network

PDF Full Text Request

Related items

1	Differentially Private Machine Learning Approaches Via Margin Distribution Optimizing
2	Research On Biased-Based Active Imbalance Learning Algorithm
3	Research On Classification Of Imbalanced Data Based On Convolutional Neural Network
4	Study On The Class Imbalance Problem In Network Intrusion Detection System
5	Multi-view Active Learning Based On Double Branch Network
6	Extensions to fuzzy ARTMAP based on structural risk minimization
7	Research On Risk Assessment Of Class Imbalance Personal Credit Data Based On Improved GAN
8	Research On Image Recognition Based On Optimized Convolutional Neural Network
9	Neural Network Modeling Of Imbalance Missing Data And Its Application
10	Imbalance Learning And Its Application In High Risk Prediction Of Prenatal Screening