People's lives are entering the digital times gradually,online transaction data is soaring day by day.At the same time,financial fraud crimes have also increased sharply,leading to huge losses in the financial institution industry.Combined with advantages of supervised learning and unsupervised learning,this article mainly explores three existing problems in anti-fraud identification.First,the problem of class imbalance.Based on the conjecture of DBSMOTE algorithm,the GMM-SMOTE algorithm is proposed to oversample the positive samples with linear interpolation.In this paper,we designed a comparison test verified that DBSMOTE performs better than GMM-SMOTE on this data set.Second,the Covariate Shift problem.In order to address the problem,transfer learning is introduced,we conjecture and verifies that the estimation of probability density ratio(Based on Kullback-Leibler divergence algorithm)can more effectively solve the problem of Covariate Shift compared with the machine learning confirmation method,which makes the model performance better.Third,the limitation of time.With the passage of time,the scams have been renovated,the unsupervised learning and supervised learning have complementary advantage in the domain of outlier detection.This article combines the Cat Boost algorithm with the advantages of isolated forests to improve the recall rate in the anti-fraud field and designs Isolation Forest,Hybrid Isolation Forest,Extension Isolation Forest comparison experiments.The result shows that the hybrid isolated forest algorithm performs better in fraud detection.In terms of feature selection,in addition to processing with conventional correlation and outlier conditions,it also incorporates the concept of temporal consistency to filter features,then drop variables with poor predictive performance over time spans.In terms of feature addition,this paper considered sliding time window features and aggregated grouping features. |