Research On Classification Algorithms For Imbalanced Data

Posted on:2023-04-03

Degree:Doctor

Type:Dissertation

Country:China

Candidate:C X Cui

Full Text:PDF

GTID:1528307022981749

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The rapid development of new generation information technology and its wide application in various fields have triggered the explosive growth of data.It is an essential task at this stage to explore the potential information contained in the vast amount of data and exploit the data’s value.As a very important task in data mining,data classification has a significant research value.However,in many practical problems,class imbalance is inevitable in both structured and unstructured data,which brings difficulties and challenges to data classification.Recently,imbalanced classification algorithms have made a series of significant advances in theory,methods,and applications.However,they still face challenges such as class overlap and intra-class imbalance,insufficient minority class representation capability,and lack of supervisory information.Focusing on these challenges,this thesis presents innovative research on classification algorithms for imbalanced data.The main research results are as follows:(1)Aiming at the problem of class overlap and intra-class imbalance in traditional imbalanced data classification,we propose an adaptive undersampling-based imbalanced data classification algorithm.First,we uses the nearest neighbor search algorithm to identify the majority class samples in the overlap area and removes them.Then,the improved density peak clustering is applied to automatically obtain multiple sub-clusters with different shapes,sizes,and densities.Finally,sampling weights are calculated according to the densities of samples in the sub-clusters,and undersampling is performed according to the sampling weights.The classifiers trained on the obtained balanced datasets are integrated by bagging.Experiments show that the proposed adaptive undersampling method based on density-peak clustering can significantly improve the performance of imbalanced data classification compared to the existing undersampling methods.(2)Aiming at the problem of insufficient minority class representation in imbalanced node classification,we propose a hybrid samplingbased graph contrast learning algorithm for imbalanced node classification.The core of this algorithm is to balance the negative sample set using hybrid sampling so that the different classes of samples are balanced.It enhances the representation of minority class nodes and thus improves the performance of imbalanced node classification.The extensive experimental results show that the method can improve the classification performance compared with graph contrastive learning,and it can obtain superior results than other state-of-the-art imbalanced node classification methods.(3)Aiming at the lack of supervisory information in imbalanced node classification,we propose a self-supervised learning-based algorithm for imbalanced node classification.On the one hand,the algorithm expands the supervision information through self-supervised learning,and on the other hand,it enhances the expressive ability of nodes through selfsupervised learning.In addition,a semantic constraint loss is designed to ensure semantic consistency in graph data augmentation regarding crossentropy loss and self-supervised contrastive loss.Experimental results on real graph datasets show that the proposed algorithm can obtain discriminative representations that are more effective for the imbalanced node classification task.In conclusion,this thesis proposes a series of imbalanced classification algorithms based on adaptive undersampling,graph contrast learning and self supervised learning technologies,to address the challenges faced by imbalanced classification algorithms,such as class overlap and intra-class imbalance,insufficient representation ability of minority classes,and lack of supervisory information.They provide some new methods and ideas for imbalanced data classification.The research results have some theoretical significance and application value for the analysis and mining of imbalanced data.

Keywords/Search Tags:

Imbalanced data classification, Undersampling, Hybrid sampling, Graph contrastive learning, Self-supervised learning

PDF Full Text Request

Related items

1	Comprehensive Oversampling And Undersampling Study Of Imbalanced Data Sets
2	Research On Neighborhood-aware Imbalanced Data Sampling Classification
3	Research On Under-sampling Classification Method Of Unbalanced Data
4	Research On Semi-supervised Graph Learning Algorithm Based On Label Augmentation And Contrastive Learning
5	Hybrid Ensemble Learning For Imbalanced Data
6	Imbalanced Data Classification Algorithm Based On Unsupervised Intelligent Under Sampling Method
7	Research On Under-sampling Algorithm For Imbalanced Data Based On Clustering And Its Application
8	Researches On Graph Representation Learning Based On Contrastive Learning
9	Imbalanced Data Classification Based On ESLEBS And Improved Linear Neighborhood Similarity
10	Classify Financial Documents Via Graph Representation Learning Based On Momentum Contrast