The Research Of Imbalanced Data Classification Algorithm Based On Support Vector Machine

Posted on:2015-11-06

Degree:Master

Type:Thesis

Country:China

Candidate:S F Hong

Full Text:PDF

GTID:2298330422988594

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In the era of information explosion, the large number of data has aroused peopleâ€™sattention, thus it needs people to find their own regular patterns and to make full use of them.Classification problem are one of the most frequently encountered problems in dataprocessing. It has become an important research content of machine learning. Comparedwith the traditional classification methods, support vector machine has several merits asfollows: high generalization ability, absence of local minima and adaptation forhigh-dimension and small sample data, which can better solve the problems listed as follows:over-learning, dimension disaster and local minima, thus we give priority to support vectormachine (SVM) in this thesis. The main idea of Support Vector Machine (SVM) is that itmakes the training set be mapped to a high dimension space by using one kernel function.Previous studies have shown that support vector machine (SVM) has the betterclassification effect on the balanced datasets, but for the imbalanced datasets, it is difficultto get excellent results. The main reason is that classification hyperplane of Support VectorMachine (SVM) is only decided by a small number of support vectors. When support vectormachine (SVM) classifier is used to deal with class imbalance problem, its prediction isbiased. The more the number of samples, the smaller its classification error is, and viceversa.To solve the problems above, the major research work of this thesis is to study how toaddress class imbalanced problem by SVM. The main research contents include thefollowing two aspects:First, we proposed one SVM-based optimal decision threshold adjustment strategy(SVM-OTHR) and its ensemble version (EnSVM-OTHR) to handle binary classificationimbalanced problem. We expect that SVM-OTHR algorithm can help us answer a puzzledquestion: how far the classification hyperplane should be moved? Specifically, the strategyis self-adapting and it can find the optimal moving distance of classification hyperplaneaccording to the distribution of training samples. Furthermore, we also extend the strategy todevelop an ensemble version (EnSVM-OTHR) that can further improve the classificationperformance. The experimental results by10skewed data sets from UCI Data Repositoryindicated their superiority.Second, we proposed one SVM-based ensemble learning algorithm to deal withhigh-dimensional and multiclass imbalanced data. The idea of the proposed algorithm is first to transform multiclass to multiple binary classes by utilizing one-against-all codingstrategy. Next, we introduce feature subspace, which is an evolving version of randomsubspace that can generate multiple diverse training subsets. Then, we introduce one of twodifferent correction technologies, namely, decision threshold adjustment or randomunder-sampling, into each training subset to alleviate the damage of class imbalance. Finally,support vector machine (SVM) was used as base classifier, and a novel voting rule calledcounter voting was presented for making a final decision. Experimental results on eightskewed multiclass cancer microarray datasets indicated that our presented method wasobviously superior to many traditional classification methods, and can improveclassification performance to a large extent.

Keywords/Search Tags:

Support Vector Machine (SVM), Class Imbalance Learning, EnsembleLearning, Classification, DNA Microarray Data

PDF Full Text Request

Related items

1	The Research Of Classification Algorithm Based On Support Vector Machine
2	Research On Fuzzy Support Vector Machine Algorithm For Class Imbalance Learning
3	Parallel Min-Max Modular Support Vector Machine With Application To Patent Classification
4	Research On Multi-instance Learning Based On Support Vector Machine
5	Research Of Learning Methods On Single-class Support Vector Machine
6	Web Pages Classification Based On Imbalanced Support Vector Machine Learning
7	Multispectral Data Classification Based On Supprot Vector Machines
8	Support Vector Machine Learning Algorithms Based On Within-Class Structure
9	Research And Application Of Discriminative Dictionary Learning Algorithms Based On Data Representation
10	Research And Application Of Imbalance Data Classification Based On Support Vector Machine