Font Size: a A A

Research On The Problem About Unbalanced Data With Balanced Sampling Method

Posted on:2015-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:X LiFull Text:PDF
GTID:2298330467485919Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
After researching for imbalance data, we propose a balanced sampling method, which combines oversampling and undersampling of resampling strategy to change the quantities of the minority class and the majority class at the same time, then we train a classifier used the balanced data set, integrate the classifiers got by repeated sampling to get the target classifier by balloting. Finally, we can get the overall classification.Experiment focused on selected data from abalone, balance, mf-kar, mf-morph, mf-zernike, wpbc, haberman, car, pima, ionosphere and wdbc in the data provided by the UCI. and Gram-negative bacteria, Gram-positive bacteria and viruses three protein subcellular localization prediction data sets. They were selected for evaluation index unbalanced classification problems to inspect the classification, experimental results demonstrate the validity and applicability of the algorithm in a wide range of imbalances.We also conducted experiments in protein subcellular localization problem, which is more complicated than UCI data sets. The selected data set extracted protein sequences into numeric feature vectors, and then use the proposed method to experiment about these vectors. The experimental results obtained show that the proposed method is still able to identify more positive class data in a highly unbalanced problem, the performance of the proposed method in a highly unbalanced problem better than the traditional imbalance classification algorithm.
Keywords/Search Tags:Data imbalance, Machine learning, Ensemble Learning, Under-sampling, Over-sampling
PDF Full Text Request
Related items