Research On Classification Algorithm For Imbalanced Data Sets Based On Support Vector Machines

Posted on:2012-09-06

Degree:Master

Type:Thesis

Country:China

Candidate:S W Hao

Full Text:PDF

GTID:2218330368982094

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The rapid development of modern computer technology, making the research and all areas of social life have accumulated large amounts of data, in order to convert these data into useful information and knowledge, data mining techniques emerged and developed rapidly.But there is a class of data set known as the imbalanced data set, this data set the number of a class of data is far greater than the number of another type of data and information provided by the minority class is often more important, so the classification of imbalanced data sets Data mining is becoming a hot research field. Support vector machine is built based on statistical learning theory of classification, has a solid theoretical basis for common data set than other classification algorithms achieve the best performance, but for the imbalanced data set is not very good classification results.This paper will first of all the characteristics of imbalanced data sets from the uneven start, The next proposed under-sampling based on cluster methods, By analyzing the obtained support vector machine classification in the imbalanced data set causes the failure, under the proposed sampling method used for majority class support vector for the under-sampling, the purpose is to remove part of the majority class samples to reduce the imbalanced degree of majority class and minority class, and then use SVM to train the new sample set, to improve the classification accuracy purposes.Current popular classification of imbalanced data sets dealing with one of the methods is cost-sensitive learning, but the support vector machine itself does not have the cost of sensitivity, it does not apply to consideration of cost-sensitive data mining, data sets based on decomposition of the proposed cost-sensitive support vector machine, through the output a posteriori probability and meta-learning process,an integrated reconstruction of misclassification cost of the new sample set, using the support vector machine on the reconstruction of the new training sample set, so that the minimum misclassification cost classification.Have carried out an algorithm for each simulation experiment, using different evaluation criteria, the experiment results and analysis of experimental results shows that the two algorithms are from improving the accuracy and to make the minimum misclassification cost have reached good results.

Keywords/Search Tags:

data mining, imbalanced data set, SVM, cost-sensitive

PDF Full Text Request

Related items

1	The Research Of Imbalanced Data Classification
2	The Application Of Improved AdaBoost Algorithm Based On Cost Sensitive In Imbalanced Data
3	Research On Multi-View Classification With Cost-Sensitive
4	Cost-sensitive Hierarchical Classification Method For Imbalanced Data
5	Research On Classification Algorithms For Extremely Imbalanced Data
6	Research On Imbalanced Data Classification Algorithms Based On Weight Analysis Of Loss Function
7	Research On Cost-Sensitive Classification Of Data Streams And Its Application
8	Research And Application Of Imbalanced Data Classification Algorithm Based On Ensemble Learning
9	Cost-sensitive boosting for classification of imbalanced data
10	Research On Ensemble Method Of Structured Support Vector Machine For Imbalanced Data