Font Size: a A A

Ensembles With Location Based Subspace Resampling For Imbalanced Pattern Classification Problems

Posted on:2017-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:T W RongFull Text:PDF
GTID:2348330536453091Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapidly development of the Internet and storage technologies,the amount of raw data grows at an explosive rate and imbalance pattern classification problems have attracted more and more attentions in recent years.For an imbalanced dataset,the numbers of samples belonging to some classes are often much larger than those belonging to others.In many occasions,samples in minority classes are much more important than those in majority classes and are difficult to be correctly classified by traditional classifiers because of the underrepresented data and severe class distribution skews.Recently,ensemble methods have been proved to be effective for imbalanced pattern classification problems.The IRUS,based on random undersampling(RUS),resamples fewer majority samples than minority samples to train an individual classifier biased towards minorities.Then the IRUS uses the bagging technique to construct diversified individual classifiers to get a good final ensemble classifier.The RUSBoost uses the RUS to change the distribution of the dataset and modifies the weight of samples,such that minorities are more likely to be selected in the next turn.The RUS works without considering the distribution of datasets and cannot construct a series of training datasets with reasonable diversity and validity.This makes ensemble learning using these feature spaces of individual datasets difficult and requires a large number of individual classifiers.This thesis proposes two ensemble methods with the Location Subspace Resampling method(LSR),which is proposed based on Bagging and Boosting techniques.The LSR divides the dataset into several subspaces according to locations of samples in their feature space and then selects a reasonable number of samples from different subspaces to construct training datasets for individual classifiers.When constructing a training dataset,the validity of the dataset and the diversity of a series of training datasets are considered.Compared to the IRUS and the RUSBoost,the proposed LSR-based methods are able to construct a series of training datasets with both high diversity and validity.Experimental results in this paper show that the proposed methods yield better performance than other random-based state-of-the-art resampling methods in imbalanced datasets with diversified characteristics.
Keywords/Search Tags:Imbalanced Pattern Classification Problems, Resampling Method, Ensembles, Location Subspace Resampling
PDF Full Text Request
Related items