Research On Unbalanced Learning Based On Sampling Method

Posted on:2020-03-01

Degree:Master

Type:Thesis

Country:China

Candidate:H Xu

Full Text:PDF

GTID:2438330626453259

Subject:Pattern Recognition and Intelligent Systems

Abstract/Summary:

PDF Full Text Request

There is a large amount of imbalanced dataset in people's lives.Studies have shown that traditional machine learning algorithms that aim to maximize classification accuracy tend to divide the samples to be tested into majority classes,while ignoring the recognition rate of minority classes.However,samples of minority classes usually have important information and are of interest to people.For classification problems,we need to consider the imbalance characteristics of the dataset when designing the classifier,otherwise the learning algorithm may produce wrong decisions.There are two main methods for imbalanced learning: methods of data-level and integration-level.Data-level methods generally have oversampling,undersampling,and hybrid sampling.Integrated methods usually refer to algorithms that combine Bagging or Boosting.In this paper,we focus on oversampling and undersampling(referred to as resampling)methods and then combine them with integrated learning to promote.Firstly,for the problem that the samples which are sparse samples and whose neighboring are heterogeneous are easily misclassified by the classifier,an oversampling method based on sample weighting is proposed.The method assigns a larger weight to the two types of samples,and then synthesizes more samples to add new sample information to the learning algorithm.Secondly,based on the research of support vector machine(SVM),the classification results is usually biased towards majority samples,and the samples closer to the hyperplane retain more classification information.Therefore,we propose an undersampling method based on margin,which translates the hyperplane into the appropriate distance of the sample space of majority classes and undersamples the samples of majority classes that are closer to the corrected hyperplane.Experimental results on the KEEL imbalanced datasets show that the proposed method improves the classification performance of imbalanced dataset.

Keywords/Search Tags:

Imbalanced dataset, Sampling, Classification, Ensemble learning, Support Vector Machine

PDF Full Text Request

Related items

1	Research On Classification Algorithms For Imbalanced Dataset
2	Support Vector Machine Based Classification Algorithms Research For Imbalanced Data
3	Application Research Of Used-car Recommendation Based On Classification Method On Imbalanced Data Sets
4	Research On Ensemble Approach For Classification Of Imbalanced Data Sets
5	Research On Support Vector Machine Classification Method For Imbalanced Datasets
6	Imbalanced Data Classification And Its Application In The Prediction Of The Mobile Phone Replacement
7	The Algorithm And Application Research Of Relevance Vector Machine For Large-scale Datasets
8	Research On Ensemble Learning
9	Research On Imbalanced Data Classification Methods Based On Ensemble Learning
10	Research On Imbalanced Dataset Classification Based On Ensemble Learning