With the continuous development of information technology,the relationship between the Internet industry and the financial industry has become increasingly close,and various Internet finance companies have sprung up.The emergence of Internet finance has broken the traditional lending methods of banks,reduced the difficulty of borrowers' financing,and enabled people to directly realize online financing.At the same time,it has also spawned many Internet microfinance companies and Internet financial platforms.The Internet lending business is constantly emerging.On the other hand,as the means of financial fraud continue to emerge,the relevant negative news has always existed,how to do a good job of risk control in the Internet financial industry is also a major problem.Today,the rapidly growing Internet finance business faces three main risks: credit risk,technology risk and regulatory risk.As the top priority of the three major risks,credit risk is directly related to the overall risk control level of Internet finance.Therefore,as the main manifestation of credit risk: fraud risk,how to effectively reduce the proportion of credit fraud of financial institutions such as commercial banks,making the anti-fraud system more perfect is imminent.Based on the research of anti-fraud system at home and abroad,this paper firstly combines the related knowledge of statistics and machine learning,and proposes an anti-fraud system based on data mining method.The system is mainly composed of four parts: data preprocessing module,model identification module,output module and monitoring module.Then,for the credit card consumer transactions of Internet consumer finance,the behavior characteristics of users such as fraudulent users and non-fraud users are studied,and the problems of serious imbalance between positive and negative samples are compared,by comparing random downsampling and synthesizing minority oversampling.Method to successfully solve the problem of sample imbalance.Secondly,using a variety of machine learning methods for experimental simulation,including Naive Bayes,support vector machine,logistic regression,K-nearest neighbor,decision tree,random forest and simple neural network,and then use grid tuning technology to optimize each model..By comparing various evaluation indicators and learning ability of each model,it is finally found that combining logistic regression with SMOTE technology can effectively identify fraudulent users and non-fraud users,and reduce the proportion of credit fraud to better control credit risk. |