Research On Ensemble Method Of Structured Support Vector Machine For Imbalanced Data

Posted on:2012-08-30

Degree:Master

Type:Thesis

Country:China

Candidate:X M Yuan

Full Text:PDF

GTID:2218330338974190

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

As imbalanced data exists widely in practical applications, it has become a new hotspot to learn classification from unbalanced data. Researchers have achieved some achievements, which are widely used in intrusion detection, credit card transactions and gene code information discovery.Existing evaluation for the balanced data or uncostsensitive classification is no longer suitable for imbalanced data as the classification of imbalanced data needs more concern on the minority data. Researchers focused on three aspects, which are data, algorithms and evaluation criteria, and have made some progress. In the existing methods for the imbalanced data, the variants of SVM have become the mainstream approach. StASVM, based on ASVM, has the introduction of the in-class structure information. StASVM has effectively improved the classification performance. In this paper, we integrated StASVM with the ensemble learning, and proposed a series of ensemble methods. We have done the main tasks as follows:1. Propose an algorithm based on StASVM, wich is called EStASVM. In this algorithm, those training samples that represent the majority class are clustered, and then several sub-classifiers are induced by the training samples extracting from the clustering result, finally an ensemble aglorithm is constructed by those sub-classifiers. This method can reduce the uneven degree between the different classes. Experiments show this ensemble method can effectively improve the stability and performance of the classification.2. Propose an algorithm RsStASVM based on random subspace, feature selection and StASVM. This method begins from sampling the feature space of the original datasets, then we can obtain the new training samples from the sub-feature space to generate sub-classifier. Experiments show the method can handle well with the imbalanced data, especially for high dimensional data.3. Design an algorithm called AdaStASVM based on cost-sensitive and AdaBoost to improve the deficiencies that EStASVM and RsStASVM can't fully use the inherent information in the samples. By clustering the majority, we obtained the prior knowledge as the initial weight of the datasets. During the classifier training, we referenced the ideology of Adaboost, which dynamically adjust the cost of samples to effectively improve the unbalanced data classification performance. Experiments show AdaStASVM can better handle the imbalanced data than EStASVM and RsStASVM.

Keywords/Search Tags:

imbalanced data, SVM, undersampling, cost-sensitive, ensemble learning

PDF Full Text Request

Related items

1	Research On Imbalanced Data Classification Methods Based On Ensemble Learning
2	Research And Application Of Imbalanced Data Classification Algorithm Based On Ensemble Learning
3	Hybrid Ensemble Learning For Imbalanced Data
4	Imbalanced Data Classification And Its Application In The Prediction Of The Mobile Phone Replacement
5	Research On Automatic Diagnosis Methods Of Breast Cancer Based On Cost-Sensitive Learning And Its Application
6	Comprehensive Oversampling And Undersampling Study Of Imbalanced Data Sets
7	The Research Of Imbalanced Data Classification
8	Research Of Ensemble Learning For High-dimensional And Imbalanced Data Classification
9	Comprehensive Oversampling And Undersampling Study Of Imbalanced Data Sets
10	Research On Imbalanced Data Issue In SAR Target Discrimination