Font Size: a A A

An Imbalanced Data Classification Algorithm Combining Clustering With Sampling Strategy

Posted on:2019-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhangFull Text:PDF
GTID:2348330545995985Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In practical application,many data sets are imbalanced.The minority samples have higher research value,it will bring more losses in practical applications to divide the minority classes into the majority classes than divide the majority classes into minority classes.There are many improved strategies for imbalanced classification.This thesis proposes a method for imbalanced classification that combines clustering and sampling strategy.First,the datasets are converted into two classes of positive and negative,we apply spectral clustering for the majority classes.Then it uses Adaboost-SVM to train the different combinations of each subset and minority class.We assign the wrongly classified data greater weight on each iteration to increase the probability of being selected in the next iterative training,the training sets can be reselected according to weights.We apply KNN to remove qualified wrongly classified minority data and synthesize new minor samples between the misclassified samples and their nearest minor samples.the process won't end until the times of iterations is reached.This method is applied to comparative datasets and telecommunication datasets.Experimental results show that the algorithm designed in this paper improves the classification results of imbalanced datasets.The learning performance is better than some proposed algorithms.
Keywords/Search Tags:Imbalanced classification, Spectral clustering, Adaboost, Misclassified samples
PDF Full Text Request
Related items