| In recent years,under the background of economic downturn and COVID-19,banks have also begun to implement online payment.Although this has alleviated the impact of the epidemic,it has also provided fraudulent conditions for criminals.Fraud prediction has become an important and meaningful topic.In the field of fraud prediction,the unbalanced characteristics of fraudulent transaction data affect the accuracy of fraud prediction to a certain extent.In order to solve data imbalance problem,this paper adopts some sampling algorithms to achieve the purpose of balancing data.A payment fraud prediction model is constructed on the basis of random forest classifier,and at the same time,experiments are conducted with bank card transaction data and credit data,which proves the effectiveness of the certain sampling algorithm in actual prediction.The main contents of this paper are as follows: Firstly,data preprocessing is performed on the bank card transaction data,mainly including correlation analysis,deletion and filling of missing values,and feature coding.For the imbalance feature of data,use SMOTE algorithm,Borderline SMOTE algorithm,and the Generative Adversarial Nets(GAN)algorithm to generate data to balance the data.After that,in order to avoid over-fitting,the random forest algorithm is used for classification,and some important parameters of the random forest classifier are adjusted to maximize the effect of payment fraud prediction.Finally,cross-validation is used to evaluate the performance of some sampling algorithms,and these algorithms are used to balance the data and use the real prediction dataset to make predictions,which improves the prediction effect of payment fraud.After predicting the bank card transaction data,this paper generalizes and applies these algorithms of balancing data to credit data,and verifies the effectiveness of the algorithms in improving the prediction fraud effect. |