Font Size: a A A

An Imbalanced Approach Towards Credit Card Fraud Detection Using Proximity Based Resampling And Classifier Ranking

Posted on:2019-02-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:MAIRA ANISFull Text:PDF
GTID:1318330569987470Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Every year,loss of billions of dollars is committed by the fraudsters through credit card transactions.Designing an efficient algorithm for credit card fraud detection can reduce the financial risk to minimum.However,the algorithms castoff in predicting fraud relies heavily on data mining techniques.Designing a fraud detection system is particularly more challenging because of the uneven distribution of the data.This nonstatic distribution makes the data to have more legitimate(non-fraudulent)transactions than the fraudulent transactions.Data exhibiting this property is called imbalanced data.Imbalanced credit card data usually makes the classifier to be overwhelmed by the class in majority(legitimate transaction)and leads to bad predictive model by not predicting the class of interest(fraudulent transaction).To overcome this problem,one of the possible solutions is to use preprocessing techniques at the data level.These preprocessing techniques are a critical step for data mining tasks as the results obtained from this are fed directly to classification techniques to build a predictive model.The preprocessing procedure includes data cleaning,data integration,data transition,resampling of data etc.This thesis focuses on two aspects of preprocessing techniques i.e.,data cleaning(noise removal)and resampling of the data(reduction and upsurge).Noisy data represents an unusual variance or error in the data that could extremely hamper the classification performance.Whereas resampling techniques are used to produce a training data for building a predictive model.Quality of that predictive model highly depends on what samples are used in training of the model.These resampling techniques reduce(under-sample)the majority class or increase(over-sample)the minority class in number to produce a balanced training set.Such balanced training sets results in a predictive model that can detect unseen fraudulent transaction.Preprocessing techniques are categorized to random and informed resampling techniques.Random techniques resample the data randomly whereas informed techniques use data distributions and proximity measures in eliminating or duplicating the instances.However,along with the benefit these random techniques also jeopardize the classifier's performance either by eliminating the potential information of the majority class or overfit the classifier's performance by making too many exact copies and not able to learn rules for the unseen transactions of the minority class.Whereas informed techniques also suffer these drawbacks by not removing the noisy samples and not directing their efforts to the most critical regions of the data e.g.area near the decision boundary.With amplification of such drawbacks in the random and informed techniques,this thesis propose novel resampling approaches and also describes a procedure to remove noisy samples from the data to increase the prediction accuracy of classifiers.This thesis specifically aims to provide resampling approaches that may not suffer such drawbacks i.e.i)a novel under-sampling approach which eliminate the most similar patterns to keep the original distribution of the data,ii)a novel over-sampling approach that may not generate similar copies of the instances belonging to minority class.For this purpose,a new similarity measure average locally centered Mahalanobis distance is used.This similarity measure is quite different from the other proximity measures implemented for resampling.As this measure uses data centric approach in finding the most critical samples whereas other informed resampling techniques use similarity measures for which covariance matrix is centered on data centroid(mean of the data).This study is the first to use this measure.Moreover,this study deals the majority class elimination at two levels i.e.,for the samples lying away and on the borderline.Similarly a twostep procedure is adopted for over-sampling of minority class and the samples are given weight according to their proximity measure and hard to learn examples.This way more samples are generated near the decision region and where they are required most to increase the prediction accuracy of the minority class.Results presented for the novel resampling approaches show promising results for the evaluation measures Area under the Curve(AUC),F-measure and G-mean.It is that the proposed resampling approaches are effective in dealing with imbalanced credit card data with high recall values.After the development of any credit card predictive model,every financial institution deploys these models using classification algorithms.These algorithms are being used for decades to detect the credit card fraud.As the credit card fraud predictive model uses classification algorithms but class imbalance affects the performance of the classifiers poorly and the results produced by these algorithms are divergent for different performance measures.Ranking these classification algorithms is a cumbersome task owing to multifaceted results produced by the performance evaluation measures.Credit card fraud data is inheritably imbalanced with non-static imbalance ratio.The classifier working for one imbalance ratio might not be able to present satisfactory results for the other.A lot of studies are presented in literature to rank such classifiers.However,this thesis gives a framework that aims to find impact of class imbalance on the performance of classifiers and rank them according to their level of skewness.This study ranks the classifiers from best to weakest classifiers using three Multi Criteria Decision Making(MCDM)techniques.Results show that choosing the right classifier according to their distribution helps in increasing the fraud catching rate.
Keywords/Search Tags:Supervised Classification, Class Imbalance Learning, Resampling, Credit Card Fraud Detection
PDF Full Text Request
Related items