Font Size: a A A

Research On Active Learning Method For Imbalanced Data Distribution

Posted on:2024-09-13Degree:MasterType:Thesis
Country:ChinaCandidate:D H YangFull Text:PDF
GTID:2568306941997519Subject:Electronic information
Abstract/Summary:PDF Full Text Request
In the current research,active learning algorithms have shown remarkable performance on balanced data classification tasks,but face challenges in the scenario of imbalanced data distribution.Balanced data distribution is only an ideal situation,however in reality,most data is imbalanced.This makes research on imbalanced data distribution crucial.Due to the nature of imbalanced data,categories with fewer examples have fewer features,making it difficult to extract information from them.Additionally,traditional active learning methods are easily influenced by redundant majority class samples and cannot accurately select effective samples to improve model performance,making it difficult to achieve excellent results in datasets with balanced class distributions.We focus on research on active learning methods for imbalanced data distribution,and propose an active sampling strategy for imbalanced data distribution.This algorithm consists of two parts,effectively addressing the problems encountered by active learning algorithms in imbalanced datasets,and improving the model’s performance.Firstly,we discussed the process of calculating the loss function.To address the problem of imbalanced data distribution,we divide the training set into categories and propose a new loss function by reshaping the traditional loss function based on data partitioning results.The algorithm’s core idea is to fully utilize the information of labeled and unlabeled examples,reduce the risk of misclassifying an example as a minority class by setting thresholds and using the predicted probability of examples,and increase the risk of misclassifying an example as a majority class.This effectively reduces the interference of redundant examples on the model and improves the accuracy of the model.This approach deals with imbalanced data classification problems from an algorithmic perspective.Secondly,we discussed the adaptive active sampling strategy.This paper proposes an adaptive active sampling strategy based on the loss function.This strategy adaptively switches sampling strategies based on the imbalance ratio of labeled positive and negative examples.Suitable unlabeled examples are selected for labeling and updating the training set to ensure a balanced number of labeled positive and negative samples during the training process.This strategy can effectively reduce the frequent queries of the model to the majority class samples,avoid ineffective sampling in the active learning process,and make the model’s training process more efficient and accurate.This approach addresses imbalanced data classification problems from a data perspective.In this paper,we verified the effectiveness of the proposed method on binary labelimbalanced datasets,and compared with other active learning methods.The results show that the proposed method significantly improves the classification performance of imbalanced data classification tasks and outperforms others in terms of classification accuracy and other metrics.
Keywords/Search Tags:Active learning, Imbalance distribution, Risk minimization, Convolutional neural network
PDF Full Text Request
Related items