Research On Cover-based Algorithms For Oversampling On Imbalanced Data

Posted on:2022-08-04

Degree:Master

Type:Thesis

Country:China

Candidate:G L Tian

Full Text:PDF

GTID:2518306323991089

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Classification has always been a hot issue in machine learning and data mining.While in many practical applications,lots of datasets in the real-world have the characteristics of imbalance.The traditional classification models might result in classification bias causing large cost.To address this issue,a variety of techniques to improve classification performance on imbalanced data have been studied.Sampling is a kind of simple and effective method to solve this problem.Although a large number of sampling algorithms have been proposed,most of them only focus on exploring the relationships between samples,and there are few studies on the relationships between sample clusters,which may fail to grasp the overall distribution of the datasets.Main factors that influence the performance of classification on imbalanced data include the imbalance ratio,the size of overlapping areas,the severity of intra-class sub-aggregation,and the proportion of noisy samples.Class overlapping has bad influence on the classification result in a great degree,especially on datasets with a high imbalance ratio.Most of the existing studies only focus on one aspect,but few methods simultaneously target datasets with both high imbalance and high overlap.To solve the above problems,this thesis explores the neighborhood relationships between samples and covers based on the Constructive Covering Algorithm(CCA)and proposes two new oversampling methods to classify imbalanced data.The main contributions are as follows:(1)An oversampling method Cover?SMOTE is proposed based on cover structure.CCA is used to construct covers by exploring the relationships between samples.Then the relationships between covers can be extracted by the K-Nearest Neighbor algorithm.The distribution of the dataset is fully studied to obtain the key minority class samples for generating synthetic tuples.SMOTE is exploited for oversampling on the key minority samples to obtain balanced datasets.To verify the effectiveness of this method,extensive experiments are conducted compared with similar algorithms CTD and CTDE on 8 KEEL datasets under the same experimental conditions,and the results show better performance of our algorithm than the compared methods.The superiority of Cover?SMOTE is further demonstrated by comparing with other 4 oversampling methods on several KEEL datasets.(2)In this thesis,a new method,quantum potential energy based cover optimization oversampling(QPCOO)for handling high imbalance and high overlap is proposed.This algorithm is designed for optimizing sampling on the overlapping area to adjust the decision boundary of the classifier by oversampling the minority class samples.It constructs better covers based on the ranked quantum potential energy of samples so as to extract minority covers in the class overlapping area in a new granularity space.Further,it oversamples just on these extracted minority covers to balance the data.Comprehensive experiments on multiple public KEEL datasets have been conducted to evaluate the advance of our approach.The improvement of classification performance on several traditional algorithms,KNN,SVM and NB,validates the effectiveness of QPCOO for data preprocessing.Additionally,comparison with state-of-the-art oversampling methods demonstrates the superiority of our proposed algorithm,which also illustrates the rationality and feasibility of the main ideas of the proposed algorithm.

Keywords/Search Tags:

Imbalanced data, Cover, Oversampling, Quantum potential energy, Classification

PDF Full Text Request

Related items

1	Research Of Imbalanced Data Ensemble Classification Algorithm Based On Oversampling
2	Research And Application Of Imbalanced Data Classification Based On Oversampling Algorithm
3	Research On Imbalanced Data Classification Methods Based On Probabilistic Oversampling
4	Research Of Imbalanced Data Classification Method Based On Oversampling And Ensemble Learning
5	Research On Methods Of Imbalanced Data Set Classification
6	Research On Oversampling Method For Multi-class Imbalanced Learning
7	Research On Imbalanced Dataset Classification Based On Oversampling Technique
8	Researches On Oversampling Methods For Imbalanced Data
9	Research Of Imbalanced Datasets Preprocessing Combined With Clustering
10	Research On Under-sampling Algorithm For Imbalanced Data Based On Clustering And Its Application