Research And Application On Imbalanced Data Set Classification Problems

Posted on:2015-02-27

Degree:Master

Type:Thesis

Country:China

Candidate:B Y Sun

Full Text:PDF

GTID:2268330425996666

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The classification problem of imbalanced data set is one of the most challengingresearch problems in the field of data mining and machine learning. In recent years,with the development of computer technology and the progress of informationtechnology, more and more decisions need the support from data. In the age of bigdata, classification that is based on data mining techniques has become the powerfulmeans of immediate decision-making, precision marketing and even improving thecomprehensive competitiveness. Imbalanced data set is a form that exists in realityarea, which describes truly and objectively the essential characters of something. In aword, people just care about a little part from the huge data, but this little part of datausually hides within the huge data, which makes us cannot separate them accurately.Imbalance data classification problem is a difficult problem about data mining, andthere are many common classification strategies for the traditional classificationproblem which cannot deal with the imbalanced data set well, so it attracts more andmore attention of experts and scholars all over the world.This thesis firstly introduce the conception of imbalanced data set and theprogress of imbalance data classification problem that is being studied by experts andscholars in the world, and it explains the reasons why imbalance dataclassification problem is so difficult to work out, the treatments we often adopt aboutthis problem, and the evaluating metric of classification performance. With fairlyconsideration about the shortage of the imbalanced data information, datesubmerging and the information loss after taking samples, this paper proposes astrategy for re-sampling of the imbalanced data, which bases on the resampling ofthe boundary of clustering. Combined with the ensemble learning method based onsupport vector machine, two aspects from data and algorithm are putting forward tosolving the imbalanced data classification problem. In the section of verification andanalysis of experiments, with four typical forms of imbalanced data sets werevalidated the effectiveness of this strategy. Finally, combined with the ensemblelearning method will imbalance data classification problem applied to a telecom customer relationship prediction, using the real data of telecom customer relationship,the specific sampling and classification strategy is integrated into the system, alsohas better classification effect in practical application.

Keywords/Search Tags:

imbalanced data, classification, re-sampling, ensemble learning

PDF Full Text Request

Related items

1	Research On Imbalanced Data Classification Based On Sampling Method And Ensemble Learning
2	Comprehensive Oversampling And Undersampling Study Of Imbalanced Data Sets
3	The Research Of Imbalanced Data Classification
4	The Algorithm Research Of Associative Classification And Classification Based On Imbalanced Data
5	Imbalanced Data Classification Algorithm Based On Unsupervised Intelligent Under Sampling Method
6	Camplaints Text Classification Research Of Imbalanced Data Sets
7	Research On Ensemble Approach For Classification Of Imbalanced Data Sets
8	Research And Application On Imbalanced Data Set Classification Problems
9	Hybrid Ensemble Learning For Imbalanced Data
10	The Research Of Imbalanced Data Based On Oversampling Technique