Support Vector Machine Algorithm Research Based On Unbalanced Dataset

Posted on:2022-05-02

Degree:Master

Type:Thesis

Country:China

Candidate:Y X Xu

Full Text:PDF

GTID:2518306509489174

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

As one of the most prevalent algorithms in today’s world,Support vector machine has been widely used in various living fields.It not only has a complete theory background,but also perform well in data classification.Support vector machine can handle simple linearly separable dataset,besides,with kernel function,it can get a nice classification quality for linearly non-separable dataset by mapping it to a high-dimensional space.In fact,scores of practical classification problems are linearly non-separable,so,it is very important to choose a appropriate kernel function and adjust corresponding parameters when using support vector machine.For the sake of RBF-kernel function’s stable classification results in both large sample dataset and small sample dataset,this article use RBF-kernel function to solve problems.Besides,its superiority has been proved by a great deal of practical experiments.Moreover,support vector machine also has a important parameter C called penalty coefficient,besides of this,RBF-kernel function involves another hyper-parameter,this two parameters are the main objects which need to be identified.The main purpose of this article is to find appropriate paremeter C and parameter.The paper’s main idea is based on hill-climbing algorithm,with the model construction theory of regression analysis,a new algorithm is raised for the selection of support vector machine’s parameters.Compared to heuristic algorithm,hill-climbing algorithm has clearer thought and simpler algorithm,so when handling large scale dataset,it can save time cost.However,there is a problem here that can not be ignored for hill-climbing algorithm,that is,hill-climbing algorithm converges so quickly that it fall into a local optimal solution.Consequently,this paper adds a model based on the theory of regression analysis to alleviate this problem and provide a parameter selection direction of next step when algorithm fall into a local optimal solution.With the development of science and technology,credit card fraud has become one of the problems for many banks in the financial transaction market,the main problem of this paper is the identification of fraudulent transactions.However,in the process of credit card fraud prediction,among the data provided by customer,the number of defaulting customers is often very small.Therefore,before using support vector machine for the classification,it is necessary to resample the unbalanced dataset.The resampling technique chosen in this paper is SMOTE algorithm.

Keywords/Search Tags:

Support Vector Machine, Hill-Climbing Algorithm, Unbalanced Dataset

PDF Full Text Request

Related items

1	Study Of Support Vector Machine Algorithms On Unbalanced Dataset
2	Research On Algorithm And Its Application Based On Support Vector Machine
3	Research And Application Of Active Learning Method For Unbalanced Data Set Based On One Class SVM
4	Research On Imbalanced Data Classification Algorithm Based On Improved RBO Algorithm And SVM Algorith
5	Research And Application Of Heterogeneous Weighted Support Vector Machine Algorithm
6	Studies Of Several Mathematical Models And Algorithms Of Support Vector Machine
7	Support Vector Machine Based Classification Models And Algorithms Research For Imbalanced Data
8	Support Vector Machine Based Classification Algorithms Research For Imbalanced Data
9	Research On The Max-Min Hill-Climbing Algorith For Bayesian Network Structure Learning
10	Unbalanced Data Classification Algorithm Based On SVM For Research And Application