Font Size: a A A

Study On The SVDD Algorithm And Its Application In Credit Card Fraud Detection

Posted on:2011-09-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y H LiuFull Text:PDF
GTID:2178360302993752Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the global growth of the credit card fraud transaction, credit card fraud techniques are constantly devised. Mainly as follows: fraudulent use of credit cards of others, counterfeiting credit cards for fading, using of invalid credit card and so on. The losses caused by the credit card fraud are increasing. How to effectively, quickly and accurately identify fraudulent credit card transactions has become the common concern in the financial sector.Data mining technology provide intelligent detection method for credit card fraud. Most of the existing methods mainly apply the classification algorithm such as Bayesian classification algorithm, decision tree and neural network to credit card fraud detection models. The fraud detection models based on these three types of algorithms have the following problems: theses algorithms are supervised classification algorithms, so some new types of fraud can not be detected; data sets required for the models belong to banking secrecy information, so it is very difficult to find samples on behalf of fraud information; because legal transactions are often much more than the fraudulent transactions, class imbalance problem is serious and it may lead to misclassification, so the classification accuracy is low. To solve the above problems, this paper proposes that we could apply one-class classification method-support vector data description (SVDD) to the model of fraud detection. One-class classification methods are unsupervised classification, they build models only using one class data, so some new types of fraud could be detected and class imbalance problem is solved. Researching on the SVDD algorithm for finding a method with high efficiency and how to apply it to the model of fraud detection are the two key points in this paper. The contribution of the paper is as follows:(1) Analyze the support vector dada description algorithm and specify its advantages and disadvantages, then point the strengths and weaknesses in applying this algorithm to set up the model of credit card fraud detection.(2) Propose a new classification algorithm k-means clustering combined with improved SVDD (KmD-SVDD). Based on the idea of divide-and-conquer and parallel computing, it first divides the whole data set into k clusters using k-means clustering algorithm. Then, it trains the k clusters in parallel by improved SVDD. Finally, it trains the k obtained local support vector sets and gets the final overall decision border. In addition, it discusses the number of clusters k on the impact of training time and gives the method of how to choose k. Synthetic data and real data experimental results shows that the proposed method has high efficiency and high classification accuracy.(3) Propose a parameters optimization method for KmD-SVDD algorithm based on ant colony algorithm. Experiments shows that regularization parameter C and kernel parameter a in the KmD-SVDD algorithm have a great influence on its performance, to further improve the classification accuracy, it applies ant colony algorithm to parameters optimization for KmD-SVDD. First of all, determine the effective bit of these two parameters by the experience of experts, C and (?) are reflected by the node value in the ant colony system. Pheromone is left on the each node traversed by the ants. Use the k-fold cross-validation average classification error rate as the objective function value to update the concentration of pheromone. In this way, the ultimate path searched out represents model with the highest accuracy. Compare with using cross-validation method directly, the method proposed in this paper can further improve the classification accuracy.(4) Propose a two-stage model of credit card fraud detection based on KmD-SVDD algorithm. This model uses the cardholders that have similar consuming behavior in place of the original single cardholder. Thus, it could overcome the defect of lacking of transaction dada when using a single cardholder. It verifies the validity of the model by comparing with other models.
Keywords/Search Tags:data mining, fraud detection, one-class classification, support vector data description, k-means clustering, parameter optimization
PDF Full Text Request
Related items