Imbalanced Data Classification Algorithm Based On Unsupervised Intelligent Under Sampling Method

Posted on:2020-10-01

Degree:Master

Type:Thesis

Country:China

Candidate:Y Luo

Full Text:PDF

GTID:2428330596986790

Subject:Applied statistics

Abstract/Summary:

PDF Full Text Request

The rapid development of information technology has spawned an era of mass production,sharing and application of data,while the cornerstone of discovering data value and conquering the ocean of data is machine learning.Classification is one of the most important problems in this field.The general classification algorithm has a default premise that the number of instances of different categories is comparable and the cost of misclassification is also comparable.However,in practical scenarios,many data are highly imbalanced: the number of samples in one category is much larger than that in other categories,which makes it difficult for general classification learning methods to achieve good classification results.In order to improve the classification performance of imbalanced data,many experts and scholars at home and abroad have done a lot of related research.At present,these studies can be roughly summarized into three levels: First,data reconstruction before model building,mainly using resampling technology to reduce the degree of imbalance between categories,such as under-sampling and over-sampling;Second,improving the classification learning algorithm to adapt to the particularity of imbalanced data sets,such as using different weights when learning different types of samples and introducing disturbances into multiple types of samples;Thirdly,combine the first two methods.Aiming at the particularity of imbalanced data sets,this paper proposes a new intelligent undersampling method based on unsupervised learning,and combines ensemble learning algorithm to better solve the problem of imbalanced data classification.This paper mainly work:1.Inquiry analysis: Analyzing the reasons why traditional classification algorithms face the failure of imbalanced data,and explore the principles and ideas of existing methods and techniques to find out some problems that still exist.2.Data reconstruction: Enlightened by grey system theory,a new under-sampling method is proposed to solve the problems existing in the previous resampling technology.It uses KNN to find the internal rules of samples,and constantly eliminates redundant samples,and retains representative samples until the number of different types of samples is equal.3.Algorithmic integration: Comparing and analyzing some characteristics and performance of commonly used classification learning methods,integrating Bagging and SVM classification algorithms and classifying the reconstructed data.4.Multi-class classification: Some common strategies for dealing with multi-class classification problems are studied,and the method proposed in this paper is extended to the classification of multi-class imbalanced data sets.

Keywords/Search Tags:

machine learning, classification, imbalanced data, sampling technology, unsupervised learning, ensemble learning

PDF Full Text Request

Related items

1	Hybrid Ensemble Learning For Imbalanced Data
2	Research On Ensemble And Imbalanced Based Supervised/Unsupervised Learning Methods And Application
3	Research On Imbalanced Data Classification Algorithms Based On Ensemble Learning
4	Comprehensive Oversampling And Undersampling Study Of Imbalanced Data Sets
5	Research On Binary Imbalanced Large Data Classification And Its Application
6	Research On Imbalanced Data Classification Based On Sampling Method And Ensemble Learning
7	Research On Unbalanced Learning Based On Sampling Method
8	Research On Ensemble Learning Algorithm For Imbalanced Data
9	Classification In Imbalanced Data Based On Over-Sampling And Ensemble Learning
10	Research And Application Of Imbalanced Data Classification Algorithm Based On Ensemble Learning