Font Size: a A A

Research And Application Of Support Vector Machine On Imbalanced Data Classification

Posted on:2020-08-27Degree:MasterType:Thesis
Country:ChinaCandidate:J LiuFull Text:PDF
GTID:2439330572974187Subject:Statistics
Abstract/Summary:PDF Full Text Request
Support Vector Machine(SVM)is a machine learning algorithm based on statistical le-arning theory,which has very good effect in classification problems.But when the data is imbalanced,the SVM algorithm will not get satisfactory results.The separation hyper-plane of SVM would tilts toward the minority for minimize the total error.This paper is based on the fact that SVM mainly depends on a small number of boundary points(support vectors),and analyzes the shortcomings of existing classical algorithms.This paper proposed an imp-roved method:BOSMOTE.The steps of this method are as follows:(1)Select a support vector:Use cost-sensitive SVM training for select support vectors;(2)Boundary resample-ing.Generate samples by interpolation between the support vectors of minority class and the majority of their k-nearest neighbors;(3)optimizing the synthetic sample.Interpolating to majority class may generate noise,so the particle swarm optimization algorithm is used to optimize the selected samples to ensure that the resampling process produces valid points;(4)Add the synthesized samples to the data set for SVM training,and obtain the classifier.By verifying the 9 sets of public datasets of the KEEL unbalanced database,G-mean,AUC,and fl are used as evaluation indicators,and the paper's algorithm is compared with the cla-ssical algorithm.The results show that the proposed algorithm has a more stable classifica-tion performance on most of the data.In the application,the class is imbalance between the departing and the unemployed in reality,the algorithm that this paper proposed is applied to the study of employee turnover warning.The 1100 data used in this application comes from the IBM Watson Analytics plat-form.Modeling is based on factors such as employees personal information,job inform-ation,and relationships with colleagues.And forecast whether employees will leave in the future,and compare them with the actual turnover status of employees.Therefore,the paper method is effective for improving the performance of the model.The stability of the propo-sed method is proved by comparing the results of other improved methods.
Keywords/Search Tags:Imbalance Dataset, Boundary Outside of SMOTE, Support Vector machine
PDF Full Text Request
Related items