Font Size: a A A

Support Vector Machine Algorithm Research Based On Unbalanced Dataset

Posted on:2022-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y X XuFull Text:PDF
GTID:2518306509489174Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
As one of the most prevalent algorithms in today's world,Support vector machine has been widely used in various living fields.It not only has a complete theory background,but also perform well in data classification.Support vector machine can handle simple linearly separable dataset,besides,with kernel function,it can get a nice classification quality for linearly non-separable dataset by mapping it to a high-dimensional space.In fact,scores of practical classification problems are linearly non-separable,so,it is very important to choose a appropriate kernel function and adjust corresponding parameters when using support vector machine.For the sake of RBF-kernel function's stable classification results in both large sample dataset and small sample dataset,this article use RBF-kernel function to solve problems.Besides,its superiority has been proved by a great deal of practical experiments.Moreover,support vector machine also has a important parameter C called penalty coefficient,besides of this,RBF-kernel function involves another hyper-parameter,this two parameters are the main objects which need to be identified.The main purpose of this article is to find appropriate paremeter C and parameter.The paper's main idea is based on hill-climbing algorithm,with the model construction theory of regression analysis,a new algorithm is raised for the selection of support vector machine's parameters.Compared to heuristic algorithm,hill-climbing algorithm has clearer thought and simpler algorithm,so when handling large scale dataset,it can save time cost.However,there is a problem here that can not be ignored for hill-climbing algorithm,that is,hill-climbing algorithm converges so quickly that it fall into a local optimal solution.Consequently,this paper adds a model based on the theory of regression analysis to alleviate this problem and provide a parameter selection direction of next step when algorithm fall into a local optimal solution.With the development of science and technology,credit card fraud has become one of the problems for many banks in the financial transaction market,the main problem of this paper is the identification of fraudulent transactions.However,in the process of credit card fraud prediction,among the data provided by customer,the number of defaulting customers is often very small.Therefore,before using support vector machine for the classification,it is necessary to resample the unbalanced dataset.The resampling technique chosen in this paper is SMOTE algorithm.
Keywords/Search Tags:Support Vector Machine, Hill-Climbing Algorithm, Unbalanced Dataset
PDF Full Text Request
Related items