An Imbalanced Data Classification Algorithm Combining Clustering With Sampling Strategy

Posted on:2019-04-06

Degree:Master

Type:Thesis

Country:China

Candidate:C Zhang

Full Text:PDF

GTID:2348330545995985

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

In practical application,many data sets are imbalanced.The minority samples have higher research value,it will bring more losses in practical applications to divide the minority classes into the majority classes than divide the majority classes into minority classes.There are many improved strategies for imbalanced classification.This thesis proposes a method for imbalanced classification that combines clustering and sampling strategy.First,the datasets are converted into two classes of positive and negative,we apply spectral clustering for the majority classes.Then it uses Adaboost-SVM to train the different combinations of each subset and minority class.We assign the wrongly classified data greater weight on each iteration to increase the probability of being selected in the next iterative training,the training sets can be reselected according to weights.We apply KNN to remove qualified wrongly classified minority data and synthesize new minor samples between the misclassified samples and their nearest minor samples.the process won't end until the times of iterations is reached.This method is applied to comparative datasets and telecommunication datasets.Experimental results show that the algorithm designed in this paper improves the classification results of imbalanced datasets.The learning performance is better than some proposed algorithms.

Keywords/Search Tags:

Imbalanced classification, Spectral clustering, Adaboost, Misclassified samples

PDF Full Text Request

Related items

1	Research On Imbalanced Data Classification In Financial Field
2	Research On Imbalanced Data Classification Methods For Unsafe Samples
3	Research On Over Sampling Algorithm Oriented To Subdivision Of Minority Class Samples In Imbalanced Data Set
4	Research On Image Classification Algorithm Based On Imbalanced Samples
5	Research Of Imbalanced Data Classification Method Based On Oversampling And Ensemble Learning
6	The Imbalanced Learning Method Of Optimal Margin Distribution Machine And Its Industrial Application
7	Research On Classification Based On Clustering For Imbalanced Dataset
8	The Application Of Improved AdaBoost Algorithm Based On Cost Sensitive In Imbalanced Data
9	Research On Imbalanced Classification Problem Based On Random Forest-Adaboost
10	Research On Network Security Based On Imbalanced Data Classification