Font Size: a A A

Research On Unbalanced Learning Based On Sampling Method

Posted on:2020-03-01Degree:MasterType:Thesis
Country:ChinaCandidate:H XuFull Text:PDF
GTID:2438330626453259Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
There is a large amount of imbalanced dataset in people's lives.Studies have shown that traditional machine learning algorithms that aim to maximize classification accuracy tend to divide the samples to be tested into majority classes,while ignoring the recognition rate of minority classes.However,samples of minority classes usually have important information and are of interest to people.For classification problems,we need to consider the imbalance characteristics of the dataset when designing the classifier,otherwise the learning algorithm may produce wrong decisions.There are two main methods for imbalanced learning: methods of data-level and integration-level.Data-level methods generally have oversampling,undersampling,and hybrid sampling.Integrated methods usually refer to algorithms that combine Bagging or Boosting.In this paper,we focus on oversampling and undersampling(referred to as resampling)methods and then combine them with integrated learning to promote.Firstly,for the problem that the samples which are sparse samples and whose neighboring are heterogeneous are easily misclassified by the classifier,an oversampling method based on sample weighting is proposed.The method assigns a larger weight to the two types of samples,and then synthesizes more samples to add new sample information to the learning algorithm.Secondly,based on the research of support vector machine(SVM),the classification results is usually biased towards majority samples,and the samples closer to the hyperplane retain more classification information.Therefore,we propose an undersampling method based on margin,which translates the hyperplane into the appropriate distance of the sample space of majority classes and undersamples the samples of majority classes that are closer to the corrected hyperplane.Experimental results on the KEEL imbalanced datasets show that the proposed method improves the classification performance of imbalanced dataset.
Keywords/Search Tags:Imbalanced dataset, Sampling, Classification, Ensemble learning, Support Vector Machine
PDF Full Text Request
Related items