Font Size: a A A

Research On Imbalanced Datasets Classification Based On Machine Learning And Oversampling Methods

Posted on:2022-10-09Degree:MasterType:Thesis
Country:ChinaCandidate:D WangFull Text:PDF
GTID:2518306527977859Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Imbalanced dataset classification has been a common topic of machine learning and data mining.In the traditional learning process,the classifiers mostly perform classification on datasets with an imbalance ratio close to 1.However,in practical applications,the imbalanced ratio of the dataset is often large.To reduce the discrimination loss rate,the classifier would tend to the majority samples in the classification process,thus the classification results are affected.With the continuous deepening of research,many algorithms for imbalanced dataset have been proposed,effectively reducing the impact of imbalanced data on the classifier's performance.This article started from the algorithm of the data level,learned and improved the classification problems of imbalanced dataset;the main work is as follows:(1)Aiming at the problems that noise samples affect the classification results and Synthetic Minority Oversampling Technique(SMOTE)has poor resistance to noise samples,a denoise oversampling algorithm was proposed.The algorithm first considers the noise samples in terms of sample location and neighbor information and filters out the noise samples of the minority.After that,the K-Means++ algorithm was introduced to synthesize the minority class samples from the cluster centers to obtain the balanced training set.Experiments were performed on 21 KEEL imbalanced datasets,using support vector machine(SVM)and multi-layer perceptron(MLP)classifiers for classification,and the improved algorithm was compared with some existing oversampling methods.The experimental results show that the denoise oversampling algorithm has a certain anti-noise ability and can improve the classifier's overall classification ability.(2)Aiming at the problem that the Radial-Based oversampling(RBO)algorithm synthesizes repeated samples easily,the Levy Flight and Radial-Based Oversampling Algorithm(LRBO)was proposed.Based on 21 imbalanced datasets,the experiment compared the sampling effects and classification effects of LRBO with no oversampling method,RBO algorithm,and some existing oversampling algorithms.It analyzed the sampling effect maps and the values of each classification evaluation index.The experimental results show that in the imbalanced dataset classification problem,the oversampling technique can improve the bias of the classifier to the minority;compared with RBO,when using SVM and MLP,the F-score increased by 1.3 and 7.7 percentage points differently,the G-mean value increased by4.8 and 5.6 percentage points differently,and the AUC value increased by 2.2 and 2.4percentage points differently,indicating that LRBO can improve the classifiers' the classification accuracy of the minority and the overall classification ability.(3)Aiming at multiple class imbalanced problems,a multi-class oversampling algorithm based on Levy flight and radial basis(MC-LLRBO)was proposed.The algorithm applies LRBO to the study of multiple class imbalanced problems and uses linear discriminant analysis(LDA)to improve the efficiency of oversampling.The experiment was based on 16 KEEL multi-class imbalanced datasets.The experiment compared the classification effects of the improved algorithm with MC-RBO and some usual oversampling methods,and analyzed the experimental data to prove the effectiveness of the improved algorithm.Experimental results show that compared with other oversampling algorithms,MC-LLRBO can reduce the probability that the classifier would misclassify all samples of a certain minority;compared with MC-RBO,when using SVM and MLP,the m AUC of MC-LLRBO increased by 12.9 and1.5 percentage points differently,indicating that MC-LLRBO can improve the average classification ability of the classifier.
Keywords/Search Tags:Imbalance Datasets, Oversampling, Noise samples, Levy flight, Multi-class
PDF Full Text Request
Related items