Extremely Imbalanced And Overlapped Data Classification

Posted on:2022-09-04

Degree:Master

Type:Thesis

Country:China

Candidate:S T Gao

Full Text:PDF

GTID:2518306479493884

Subject:Software Engineering

Abstract/Summary:

PDF Full Text Request

With increasing powering of data storage and advances in data generation and collection technologies,large volumes of data become available in real-world applications.Among them,imbalanced class distribution datasets widely exist in various real-world applications,and existing canonical classifiers applied to imbalanced data classification often fail because they are based on the premise that the number of instances in each class is equal and the misclassification cost is the same.How to mine information from imbalanced data and build models are attract rising attention from researchers,and subsequently,a great number of approaches have been proposed.However,most of these models perform poorly under a scenario in which datasets are characterized with high class imbalance,class overlap and noisy data.In this paper,we delve into the preference,information loss and overfitting problem faced in imbalanced data classification from data scenarios.We explore the application of self-paced learning in the field of imbalanced data classification and the importance of overlapping region instances,respectively.We propose a novel framework called DAPS(DynAmic self-Paced enSemble)that contains two important steps:(1)reasonable and effective sampling to maximize the utilization of informative instances and avoid serious information loss;and(2)assigning proper instance weight to address the issues of noisy data and model overfitting.The main contributions of this paper are summarized as follows.1.Designing a dynamic self-paced sampling mechanism for training sample selection,which can select most reasonable and effective instances in training process,maximize the utilization of instances,avoid overfitting and information loss problem.Using a unique measure to compute the classify difficulty under different classifiers and different data distributions.2.Designing a instance weighting mechanism to deal with class overlapping and noisy,which can identify the instances in class overlapping region,and enhance the attention of important instances and weaken the learning of noisy data by different weights to different instances.3.Proposing a novel framework for classification of highly imbalanced,class overlapped and low-quality data called DAPS(DynAmic self-Paced enSemble).Most of the existing canonical classifiers(e.g.Decision Tree,Random Forest,GBDT)can be integrated in DAPS.The comprehensive experimental results on both synthetic and three real-world datasets show that the DAPS model could obtain considerable improvement in accuracy when compared to a broad range of models.

Keywords/Search Tags:

classification, imbalanced data, overlapped data

PDF Full Text Request

Related items

1	The Research Of Imbalanced Data Classification
2	The Algorithm Research Of Associative Classification And Classification Based On Imbalanced Data
3	Research On The Classification Algorithm Of Imbalanced Data Sets
4	Research On Methods For Imbalanced Data Classification
5	Research On Application Of Classification Algorithms For Imbalanced Data
6	Research On Classification Method For Imbalanced Data Sets And Its Application
7	Research And Application Of Imbalanced Data Classification Algorithm
8	Camplaints Text Classification Research Of Imbalanced Data Sets
9	Research On Classification Methods For Large-scale Imbalanced Data
10	Research On Imbalanced Data Classification Methods Based On Probabilistic Oversampling