Font Size: a A A

Research On Classification Method Of High-dimensional Class-imbalanced Data Sets Base On SVM

Posted on:2018-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:J R LuFull Text:PDF
GTID:2348330533969245Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,the problem of classification for high dimensional and class-imbalanced data is found in many fields like bioinformatics and so on.High dimensional problem result in bad classification results because of some combinations of features that influent negative of classification.Class-imbalanced problem means the number of samples of one class is more than another class,which would make the classifier concerns the majority class more but the minority less and influence the result of classification.The two problems are both exist in high dimensional and class-imbalanced data sets.They influence each other and produce a new problem.Many researchers make researches on high dimensional problem and class-imbalanced problem separately and come up with a series of algorithms.However,many people deal with the problems separately instead of consider the new problem when the two problems are both exist in a data set.This article introduces the two problems and analysis the new problem produce by the influence of the two problems firstly.And then this article introduces SVM,analysis its advantages on dealing high dimensional problem and class-imbalanced problem and analysis the advantages and disadvantages of existed algorithms dealing with the two problems.Next,this article improves SVM-RFE so that the procedure of feature selection could also consider the class-imbalanced problem and improve SMOTE so that the procedure of over-sampling could do in Hilbert space by using the characteristic of dividing boundary automatically of SVM and the over-sampling rates are set adaptably meanwhile.Finally,a classification algorithm aimed at high dimensional and class-imbalanced data sets is come up in this article which named BRFE-PBKS-SVM: Border-Resampling Feature Elimination and PSO Border-Kernel-SMOTE SVM.And a series of experiments were made to prove the effectiveness of this algorithm by using different evaluation indexes.
Keywords/Search Tags:high-dimensional and class-imbalanced, SVM, feature-selection, boundary samples, over-sampling
PDF Full Text Request
Related items