Font Size: a A A

Neural Network Approaches For Imbalanced Data Classification

Posted on:2024-02-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z A HuangFull Text:PDF
GTID:1528307166499254Subject:Doctor of Engineering
Abstract/Summary:PDF Full Text Request
In recent years,neural networks have attracted much attention because of their excellent data-fitting ability.However,the optimization algorithm of neural networks usually implies a prior assumption that the data are balanced.When the data become imbalanced,this assumption will be broken,causing the learning bias of neural networks.Most of existing neural network methods deal with the imbalanced data problem focus on resampling or reweighting techniques to rebalance known data to meet the prior assumption of the optimization algorithm.These methods usually ignore the essence of data imbalance,that is,the insufficient empirical representation caused by the small number of positive samples.For imbalanced data classification by neural networks,the insufficient empirical representation will lead to the uncertainty of gradient rotation and scaling scale during neural network learning,making resampling or reweighting methods unable to match the operating characteristics of the neural network optimization algorithm well.In addition,the problem of compact representation in neural network learning is also difficult to solve the problem of insufficient empirical representation.Therefore,in order to achieve the purpose of improving the scalability and practicability of neural networks,the neural network approaches for imbalanced data classification are proposed.The main contributions are as follows:(1)From the principle of optimization algorithm,aiming at the uncertainty of gradient rotation and scaling scale of neural networks,a neural network approach with controllable gradient rotation is proposed for imbalanced data classification.Specifically,the gradient scaling problem is solved by constraining gradients of positive and negative classes to unit gradients.The unit gradient constraint restores the learning ability of neural networks to work under imbalanced data.In addition,in order to solve the uncertainty problem of gradient rotation,a controllable gradient rotation strategy is proposed to constrain the direction of gradient rotation,which realizes the local controllable expansion of the distribution region of known positive samples on any imbalanced dataset.(2)In terms of data resampling,for the uncorrelated problem between existing undersampling strategies and the neural network optimization algorithm,a neural network method with preference undersampling is proposed for imbalanced data classification.Specifically,the preference undersampling contains neural network undersampling and boundary expansion for the positive class.Among them,neural network undersampling aims to solve the gradient inundation problem of neural networks in imbalanced data learning.The boundary expansion for the positive class realizes the local expansion of known positive samples.The best matching relationship between the neural network optimization algorithm and the undersampling strategy is established.(3)On the other hand,since existing oversampling methods cannot fundamentally solve the problem of insufficient empirical representation of the positive class,a neural network approach with absent positive sample oversampling is proposed for imbalanced data classification.Specifically,using a sampling algorithm to sample negative samples with sufficient empirical representation,and the rejected samples are collected as candidate absent positive samples during the sampling process.Then,in order to speed up the sampling process,a line segment transition kernel is proposed to shrink the proposal state transition space.Aiming at alleviating the unreliable class attribute definition problem caused by the inaccurate probability estimation in the high dimensional data space,a class probability constraint condition is proposed to improve the quality of sampled absent positive samples.Additionally,according to the optimization algorithm of neural networks,the local boundary shifting and global boundary shifting strategies are proposed to adopt the sampled absent positive samples.From the data point of view,the proposed method breaks through the dilemma of the consistency of the empirical distribution before and after the oversampling and realizes the effective expansion of the empirical distribution area of known positive samples.(4)From the learning strategy of the optimization algorithm,aiming at the compact representation problem of one-dimensional linear attribute space defined by a crossentropy loss function,a new attribute space of imbalanced data is defined based on a two-dimensional plane.In order to solve the hypersphere collapse,ambiguous inter-class relationship,and compact representation problems,a neural network method with specific data information learning is proposed for imbalanced data classification.Specifically,based on the hypersphere-based paradigm of one-class learning,a neural network is used to map known data to two non-overlapping positive and negative regions of the class attribute space of imbalanced data.Then,in order to alleviate the performance degradation of positive samples from the compact representation problem,a dynamic information potential energy is proposed to disperse negative samples of the class attribute space as much as possible.From the perspective of the optimization algorithm,the proposed method provides a specific data information learning mode in the attribute space of imbalanced data and realizes the effective shrinkage of non-negative regions in the data space.(5)Finally,in practice,for the multi-class data imbalance problem in electrocardiograph classification,a weighted rebalancing neural network method is derived based on the idea of controllable gradient rotation.Specifically,aiming at the similarity and difference problem of data,a one-dimensional convolutional neural network is designed to automatically extract robust category features.For the multi-class data imbalance problem,a weighted rebalancing loss function strategy is derived to balance the false positive and false negative predictions.The proposed method provides an automated algorithmic scheme for electrocardiograph classification,which can be considered as a clinical aid to reduce clinical diagnostic pressure and misdiagnosis rate.
Keywords/Search Tags:Neural network, imbalanced data classification, the insufficient empirical representation, controllable gradient rotation, preference undersampling, absent positive sample oversampling, specified data information learning
PDF Full Text Request
Related items