Research On Support Vector Machine And Decision Tree Algorithm For Imbalanced Datasets

Posted on:2019-11-24

Degree:Master

Type:Thesis

Country:China

Candidate:M J Hou

Full Text:PDF

GTID:2417330563958859

Subject:Applied statistics

Abstract/Summary:

Tranditional machine learning method is used on the premise that the amount of samples tends to be infinite.while in actual life,the amount of samples is finite,when dealing with such samples,excellent tranditional learning method can’t satifity us.Support vector machine which based on statistical theory resolved problems like nonlinearity,local extremum,small sample and high dimention which exist in tranditional machine learning methods.In actual research,although we can get an abundance of samples,for the datasets’ own character or some external factors,datasets which are really useful to us are less and less.Datasets whose amount in one kind is far more than that in other kinds are called imbalanced datasets.In imbalanced datasets,because the minority samples lack typical sample,classification rules describing minority sample are less and weak.In the case of faling to do anything to the data,existing algorithms are always partial to the majority sample,which makes the accuracy of minority sample decends worsely,but most often,we need to research minority sample,because misjudgement of minority sample may bring fatal result.So it’s necessary to find effective method for the concrete classification method of imbalanced datasets.The article researches the essence of classification for imbalanced datasets,processing methods,assessment index,additionaly,it introduces CBO-SVM,RUSMOTE-SVM,SMOTEBoosting-SVM,and C5.0 decision tree model combined with statistical theory.What’s more,it empirically analysize the credit cards customer default data of Taiwan,at the same time,it chooses the ST data of some listed companies and the customer loss data of one telecom company.At the end,it compares the classification performance of all the models with the SVM model and SMOTE-SVMmodel.The result indicates RU-SMOTE-SVM has the best effect,which illustrates that RU-SMOTE-SVM can be an excellent model to classify imbalanced data.

Keywords/Search Tags:

imbalanced dataset, support vector machine, decision tree

Related items

1	Research Of Population Unbalance Risk Early-Warning Model Based On Support Vector Machine
2	Price Forecast Of IT Internet Online Courses
3	Based On MEMS Sensors Human Body Elderly Assistance And Rehabilitation Motion Gesture Recognition Research
4	Neuron Classification Based On Their Morphological Features
5	The Prediction Of Sailing Speed Based On Support Vector Machine
6	Application Of Support Vector Machine In The Analysis Of Population Data
7	Dual Sparse Support Vector Machine Based On Lasso
8	Mining Web-based Learning System Data To Detect Different Pattern Of The Student During Completing Course
9	A Class Of Online Algorithms For Support Vector Machine And Their Application
10	A Research On Learning Process Evaluation Based On Support Vector Machine