Font Size: a A A

Research On Ensemble Method Of Structured Support Vector Machine For Imbalanced Data

Posted on:2012-08-30Degree:MasterType:Thesis
Country:ChinaCandidate:X M YuanFull Text:PDF
GTID:2218330338974190Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As imbalanced data exists widely in practical applications, it has become a new hotspot to learn classification from unbalanced data. Researchers have achieved some achievements, which are widely used in intrusion detection, credit card transactions and gene code information discovery.Existing evaluation for the balanced data or uncostsensitive classification is no longer suitable for imbalanced data as the classification of imbalanced data needs more concern on the minority data. Researchers focused on three aspects, which are data, algorithms and evaluation criteria, and have made some progress. In the existing methods for the imbalanced data, the variants of SVM have become the mainstream approach. StASVM, based on ASVM, has the introduction of the in-class structure information. StASVM has effectively improved the classification performance. In this paper, we integrated StASVM with the ensemble learning, and proposed a series of ensemble methods. We have done the main tasks as follows:1. Propose an algorithm based on StASVM, wich is called EStASVM. In this algorithm, those training samples that represent the majority class are clustered, and then several sub-classifiers are induced by the training samples extracting from the clustering result, finally an ensemble aglorithm is constructed by those sub-classifiers. This method can reduce the uneven degree between the different classes. Experiments show this ensemble method can effectively improve the stability and performance of the classification.2. Propose an algorithm RsStASVM based on random subspace, feature selection and StASVM. This method begins from sampling the feature space of the original datasets, then we can obtain the new training samples from the sub-feature space to generate sub-classifier. Experiments show the method can handle well with the imbalanced data, especially for high dimensional data.3. Design an algorithm called AdaStASVM based on cost-sensitive and AdaBoost to improve the deficiencies that EStASVM and RsStASVM can't fully use the inherent information in the samples. By clustering the majority, we obtained the prior knowledge as the initial weight of the datasets. During the classifier training, we referenced the ideology of Adaboost, which dynamically adjust the cost of samples to effectively improve the unbalanced data classification performance. Experiments show AdaStASVM can better handle the imbalanced data than EStASVM and RsStASVM.
Keywords/Search Tags:imbalanced data, SVM, undersampling, cost-sensitive, ensemble learning
PDF Full Text Request
Related items