Font Size: a A A

Research On Support Vector Machine And Decision Tree Algorithm For Imbalanced Datasets

Posted on:2019-11-24Degree:MasterType:Thesis
Country:ChinaCandidate:M J HouFull Text:PDF
GTID:2417330563958859Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
Tranditional machine learning method is used on the premise that the amount of samples tends to be infinite.while in actual life,the amount of samples is finite,when dealing with such samples,excellent tranditional learning method can't satifity us.Support vector machine which based on statistical theory resolved problems like nonlinearity,local extremum,small sample and high dimention which exist in tranditional machine learning methods.In actual research,although we can get an abundance of samples,for the datasets' own character or some external factors,datasets which are really useful to us are less and less.Datasets whose amount in one kind is far more than that in other kinds are called imbalanced datasets.In imbalanced datasets,because the minority samples lack typical sample,classification rules describing minority sample are less and weak.In the case of faling to do anything to the data,existing algorithms are always partial to the majority sample,which makes the accuracy of minority sample decends worsely,but most often,we need to research minority sample,because misjudgement of minority sample may bring fatal result.So it's necessary to find effective method for the concrete classification method of imbalanced datasets.The article researches the essence of classification for imbalanced datasets,processing methods,assessment index,additionaly,it introduces CBO-SVM,RUSMOTE-SVM,SMOTEBoosting-SVM,and C5.0 decision tree model combined with statistical theory.What's more,it empirically analysize the credit cards customer default data of Taiwan,at the same time,it chooses the ST data of some listed companies and the customer loss data of one telecom company.At the end,it compares the classification performance of all the models with the SVM model and SMOTE-SVMmodel.The result indicates RU-SMOTE-SVM has the best effect,which illustrates that RU-SMOTE-SVM can be an excellent model to classify imbalanced data.
Keywords/Search Tags:imbalanced dataset, support vector machine, decision tree
PDF Full Text Request
Related items