Research On Resampling Methods For Imbalance Data

Posted on:2019-07-24

Degree:Master

Type:Thesis

Country:China

Candidate:B Q Duan

Full Text:PDF

GTID:2428330551958754

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

The imbalance data classification is one of the important problems in the field of machine learning and pattern recognition.In practical application,the minority class has less objects but more important significance.However,the traditional way of classification is more likely to ensure the overall accuracy.Leading to the performance of algorithm,more bias in the majority,while ignoring the minority,which affects the classifier' s recognition of minority.In recent years,the methods of over-sampling have been widely used in the field of classification for imbalance class.SMOTE(Synthetic Minority Oversampling Technique)is a classical algorithm based on resampling technology presented by Chawla.To some extent,SMOTE has alleviate the imbalance degree of data but is apt to lead to over-fitting.Remove majority samples to achieve the relative balance of the number of the majority and minority is also a simple and intuitive resampling method,the under-sampling method.However,most under-sampling methods eliminate majority samples without distinction can easily lose valuable information in majority.In view of the above issues,this paper conducts research from oversampling and undersampling.1)DS-SMOTE(Density based Synthetic Minority Oversampling Technique).The The DS-SMOTE algorithm identifies the sparse samples based on the density of the samples and uses them as the seed samples in the sampling process.Then,to create synthetic sample between seed and its neighbor using the process of SMOTE.Then a small number of sets equal to the majority of the samples are obtained.The experimental results show that the DS-SMOTE algorithm can effectively improve the classification accuracy of minority class compared with other similar methods,and has certain advantages in dealing with imbalance data classification problems.2)Dissimilarity matrix based under-sampling classification method.This method divides the relationship between the sample and its neighbors into four situations through the dissimilarity matrix of the sample,selectively removes most of the classes,and adds the Boosting process to ensure that the sample is fully trained.Experimental results show,the Dissimilarity matrix based under-sampling classification method has greatly improved the classification accuracy of minority classes compared with similar algorithms.In this paper,the imbalanced data classification methods are discussed and researched and two resampling-based algorithms are proposed,which effectively solves the problem of low classification performance of minority classes in the classification process.However,these two algorithms also have some limitations.How to adapt to the actual imbalance data set still needs further study.

Keywords/Search Tags:

Imbalance Datasets, classification, over-sampling, undersampling

PDF Full Text Request

Related items

1	Research On Under-sampling Classification Method Of Unbalanced Data
2	Research On Imbalanced Datasets Classification Based On Machine Learning And Oversampling Methods
3	Study Of Efficient Feature Selection And Classification Methods For Gene Expression Microarray Datasets
4	Research Of Imbalanced Datasets Preprocessing Combined With Clustering
5	Research On Classification Algorithm Based On Hybrid Sampling For Imbalanced Data
6	Research On Methods For Classifying Imbalanced Data
7	Comprehensive Oversampling And Undersampling Study Of Imbalanced Data Sets
8	Research On Classification Method For Imbalanced Datasets
9	Search On Photonics-Assisted Broadband RF Receiver Based On Optical Undersampling
10	Alleviating class imbalance using data sampling: Examining the effects on classification algorithms