Research On Classification Method Of Imbalanced Data Sets

Posted on:2020-08-02

Degree:Master

Type:Thesis

Country:China

Candidate:S L Liu

Full Text:PDF

GTID:2428330575956642

Subject:Mathematics

Abstract/Summary:

PDF Full Text Request

Classification plays an important role in data mining.It helps us find valuable underlying information.However,data imbalance usually affects the classification performance.Faced with imbalanced data sets,traditional algorithms always are difficult to guarantee the classification effect of the minority.Thus,it is necessary to study the classification method of imbalanced data sets.In view of data sampling,feature selection and classification algorithm,this paper proposes algorithms to improve the classification performance of imbalanced data sets.(1)As for sampling in data pre-processing,feasible resampling before classification of imbalanced data sets may exert positive effect.To overcome the defect that Borderline-SMOTE oversampling algorithm ignores the optimization of majority,the thesis presents the RENN-BSMOTE resampling algorithm for imbalanced data.By analysing the class distribution of the nearest neighbors,samples with poor similarity to neighbors are deleted iteratively.Besides,Borderline-SMOTE algorithm is used to oversample the minority.The superiority of the new RENN-BSMOTE algorithm are fully illustrated by comparing with other algorithms according to experiments.(2)In terms of feature selection,proper feature selection of data generally has a good impact on the classification.Since Relief feature selection methods neglects the imbalanced characteristics of data,this thesis suggests the improved DK-ReliefF algorithm.It sets different number of nearest neighbors belong to the same and different class,and update the weight vector of feature according to the distance between the samples and their neighbors.Finally,the features with high distinguishing ability are selected to cooperate with the classification.By comparing the DK-ReliefF algorithm with other algorithms,we can know that the DK-ReliefF algorithm effectively improve the classification perfonnance.(3)In terms of classification algorithm,KNN is a widely used classification algorithms.One ofthe problems KNN faces is how to decide the appropriate number of the nearest neighbors.Based on the previous work,a new PTM-DWKNN classification algorithm for selecting the optimal local k value is proposed to suit imbalanced data sets.The optimal local A value is obtained due to the local characteristics of samples.Besides,thinking over the different influence of the neighbors according to their different distance to the tested sample,the object samples are classified by weighted voting.Finally,it shows that the PTM-DWKNN algorithm can better classify imbalanced data sets with contrast experiments.

Keywords/Search Tags:

Imbalanced data set, Mixed sampling, Feature selection, Classification

PDF Full Text Request

Related items

1	Feature Selection And Classification For Imbalanced Medical Data
2	Research On Methods For Imbalanced Data Classification
3	Research On Feature Selection Algorithm On Imbalanced Data Classification
4	Research On Imbalanced Data Classification Learning Algorithm Based On Mixed Sampling Technique And Adaboost Principle
5	Research On Imbalanced Data Classification Algorithm Based On Feature Selection
6	Research On Classification Method Of High-dimensional Class-imbalanced Data Sets Base On SVM
7	Text Classification Algorithm Based On Imbalanced Data Sets
8	Feature Selection And Semi-supervised Classification For Imbalanced Data
9	An Analysis Of Combining Ensemble Sampling And Fea-Ture Selection Methods For Imbalanced Multi-Class Internet Traffic Classification
10	The Research Of Imbalanced Data Classification