Font Size: a A A

Statistical Research On Identifying Transaction Risks Based On Consumption Process Data

Posted on:2021-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:L F ZhengFull Text:PDF
GTID:2437330602498148Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the continuous development and change of internet technology,commercial trade has changed from offline to online,and people tend to consume online.Online transactions enable users and businesses to get convenience,efficiency and economic benefits.At the same time,there are more and more network consumption loopholes,which leads to increased risk of online transactions.The network black industry is a new type of transaction risk in the process of network consumption.It uses illegal means to seek interests in internet transactions.With the continuous improvement of big data technology and data mining technology,it is possible to identify the transaction risk.In particular,the research and identification of black industry can not only protect the normal operation of businesses,but also make the trading experience of natural users more smoother.In this paper,we use the random forest model and light GBM model to identify the natural users and black industry users according to the consumption process data provided by Data Castle competition.In view of the data imbalance,SMOTE oversampling algorithm is used to balance the data.In order to compare the influence of unbalanced data on the training model,the data before and after the balanced processing is modeled respectively,and the models are tested by 5-fold cross validation.Finally,the model is evaluated by the evaluation index of unbalanced data model.The results show that the random forest model and light GBM model based on SMOTE have achieved good classification results.The overall accuracy of the model prediction is over 97%,the accuracy of majority-class examples is 99%,and other indicators are over 95%.Compared with the direct use of unbalanced data classification,the overall accuracy is improved by 1%,and the accuracy of majority-class examples is improved by 32%.
Keywords/Search Tags:Imbalance data, SMOTE oversampling, Light GBM model, Random forest model
PDF Full Text Request
Related items