| In the era of big data,economy and Internet technology are developing rapidly and online trading volume has exploded.But the explosion of data also makes the criminals targeting online trading market,causing frequent fraud.Currently,according to the characteristics of data labels,fraud detection is divided into supervised,unsupervised and semi-supervised detection.The data set of fraud scenario has the characteristics of extreme imbalance and high dimension.When using supervised classification algorithm,it is usually necessary to introduce sampling or dimensionality reduction to preprocess the original data.So too much manual intervention may affect the distribution characteristics of the original samples,and the prediction results are not accurate.In the semi-supervised scenario,scholars mainly use autoencoder algorithm and only learn the information of normal transaction samples.To some extent,this will lead to the waste of fraudulent sample information.According to the problem of inaccurate prediction results caused by a large number of tedious and time-consuming artificial preprocessing of the above supervised machine learning algorithm,in this research,the auto-encoder fraud detection algorithm is applied to supervised data,and a new model of pseudo twin autoencoder fraud detection model is proposed by training positive and negative samples with deep neural network.This model not only has high robustness,but also does not need to sample the original sample and reduce the dimension.While maintaining the original characteristics of the data,it also realizes the utilization of the full sample information and the application expansion of the semi supervised autoencoder.In this study,we set up low dimensional equilibrium,low dimensional disequilibrium,high dimensional equilibrium and high dimensional disequilibrium simulation data sets,and use autoencoder model,pseudo twin autoencoder model and traditional machine learning model to conduct multiple simulation comparison experiments.Experimental results show that compared with the traditional autoencoder model,the AUC index of pseudo twin autoencoder model is improved by about 1 percentage point.Compared with logistic regression classifier,AUC index generally increased by about 1.5 percentage points.Compared with the integrated classification model LightGBM,the AUC index is partially improved.In addition,the variation of pseudo twin autoencoder index with the hyperparameter α is analyzed.The analysis shows that there are different trends of AUC index in different data sets.Under the balance sample,AUC reaches its peak when α is about 0.5,and AUC reaches its peak when α is about 0.8 in the unbalanced sample.At the same time,in order to further verify the actual effect of pseudo twin autoencoder model in fraud scenarios,credit card transaction data set and Vesta online transaction data set were selected for verification.The results show that in credit card transaction data set,the AUC of pseudo twin stacked autoencoder model is 0.9769,which is four percentage points higher than the integrated model LightGBM.In the high dimensional Vesta online transaction data set,the pseudo twin stacked autoencoder model achieves 0.8260 AUC,which is superior to the conventional stacked autoencoder model and machine learning models.It can be concluded from the above research that the pseudo twin autoencoder model can be applied to supervision fraud detection and other related fields to improve the efficiency of fraud sample identification. |