Font Size: a A A

Research On Problems Existing In Credit Card Data Fraud Examination And Solutions

Posted on:2021-04-07Degree:MasterType:Thesis
Country:ChinaCandidate:Z H HuFull Text:PDF
GTID:2428330611469760Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
With the promotion and popularization of credit cards,more and more people enjoy the happiness of early consumption and paperless consumption,which provides great convenience to cons umers and businesses.However,the problem of fraud has continued with the birth of credit cards,and billions of euros are lost every year due to credit card fraud.As a result,financial institutions urgently need a fraud detection system with good performance,hoping to replace the traditional artificial visual inspection.However,credit card data comes with serious data imbalances and data flow concept drift.After all,transaction fraud data is only a minority,and the models trained with lagging data cannot discern the everchanging fraud.Due to the severe class imbalance and concept drift of credit card data,it will have an adverse effect on fraud prediction.This paper proposes the following two options:(1)fusion clustering and integrated undersampling method,innovating an undersampling algorithm that balances the advantages and disadvantages of both The diversity and richness of the majority of sample data;Then,for the problem of MAHAKIL oversampling algorithm in small data extraction,this paper fuses the algorithm with clustering,so that the innovative algorithm no longer has the above disadvantages.Finally,this article will solve the problem of data imbalance based on the mixed sampling method of undersampling and oversampling.(2)Propose to use active learning to reduce the impact of data flow concept drift.The level of active learning efficiency depends on whether efficient and accurate sampling strategies can be used.This article uses the QUIRE algorithm,the goal of which is to be able to screen out representative and highly uncertain examples.The most important thing is that the algorithm is sensitive to the distribution of samples and class boundaries,so even in the case of outliers in the data set,efficient sample instances can be selected.Finally,we use real credit card fraud data to conduct a large number of comparative experiments on the two proposed schemes to test whether the model is superior and draw conclusions.
Keywords/Search Tags:data imbalance, concept drift, mixed sampling, active learning, MAHAKIL algorithm, QUIRE algorithm
PDF Full Text Request
Related items