Font Size: a A A

Research On Unbalanced Data Classification Based On Support Vector Mixed Sampling

Posted on:2022-01-11Degree:MasterType:Thesis
Country:ChinaCandidate:F JiangFull Text:PDF
GTID:2518306326985779Subject:Mathematics
Abstract/Summary:PDF Full Text Request
In order to find the hidden information behind the data and make the data better applied to production and life,the research on data classification has always been a hot spot.However,in real life,a lot of data is unbalanced,and the classification effect is not good.Therefore,how to improve the classification effect of unbalanced data and make the data better used in life has always been a practical research direction.Based on support vector machine(SVM)as base classifier,this paper improves the classification of unbalanced data from three different angles:First,from the point of view of data preprocessing.Aiming at the problem that the classification effect is not ideal due to the large difference of samples between classes,a mixed sampling method based on boundary sample weighting is proposed.The method first de-noises the samples,then weighted oversampling the boundary samples of a few classes according to the distance density.Finally,the non-boundary samples are under-sampled to obtain the final results.KEEL data set and credit card fraud detection are used to verify the superiority of the algorithm.Second,from the point of view of support vector imbalance.A mixed sampling method based on support vector weighting is proposed to solve the problem of poor classification effect caused by the difference of support vectors between classes.Firstly,the samples after SVM classification are de-noised,and a few support vector samples are oversampled according to the principle of distance density.The non-support vector samples of most classes are processed by integrated under-sampling method.According to the results of UCI data set and human pose,the algorithm is superior to other contrast algorithms.Third,from the point of view that the real support vector is too small.In order to solve the problem that a few support vector samples are too few and the traditional sampling method can not give the real minority support vector because of the distribution in the class,an unbalanced data classification method based on support vector machine mixed sampling is proposed.The main idea is to find the real support vector by over-sampling some support vectors.This method firstly over-sampled the classification error and the partial classification correct minority class support vector,and selectively under-sampled the majority class non-support vector.UCI data set shows that this method is obviously superior to other contrast methods.In view of the poor effect of unbalanced data classification,this paper improves the data preprocessing,support vector imbalance and real support vector from three angles.The experimental results show that the above three methods have improved the classification effect of unbalanced data to a certain extent.
Keywords/Search Tags:Unbalanced data, Support Vector Machine, Over-sampling, Under-sampling
PDF Full Text Request
Related items