Font Size: a A A

Improvement And Application Of Random Forest Algorithm In Credit Card Fraud Detection

Posted on:2021-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q LuoFull Text:PDF
GTID:2428330611469758Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
The technological revolution promotes the innovation and development of information technology.Driven by the rapid development of information technology,human life is changing with each passing day,living standards are constantly improving,and lifestyles are constantly changing.Not only that,the development of information technology has also caused various types of data information to grow exponentially,so humans have entered the era of big data.Massive data contains a variety of important information.Machine learning theory can help humans learn useful knowledge contained in intricate massive data and provide services for decision-making.As an important and common machine learning method,random forest is a new combined classifier algorithm.Its good performance and advantages make it widely used in various fields.The field of fraud detection is one of the current research hots-pots.But random forest still has some flaws.Randomly selecting features reduces the data association lines,but also reduces the model training intensity.In the problem of unbalanced classification,the use of random forest models cannot accurately predict the results.This makes the optimization problems related to random forests very valuable for research.In this paper,we will select feature selection and imbalance classification from two perspectives and analyze the processing of credit card fraud detection data to study the optimization problem of the random forest model.First,based on the random forest,the correlation between features is calculated using chi-square to rank the features.After that,the features will be divided into two intervals,which are sampled separately,and the feature selection is completed by using the linear combination of the features.In addition,in the study of the class imbalance problem,two algorithms of random forest,balance and weighting,are used in combination to design a balanced weighted random forest,while improving the drawbacks of resampling and cost-sensitive learning itself.Moreover,while experimenting on the improvement of feature selection,the improved algorithm of the class imbalance problem is also experimented.The F1 value is used to compare and evaluate the experimental results.Finally,this paper summarizes the improvement results of random forest feature selection and imbalanced classification problems,and points out the direction of follow-up research.
Keywords/Search Tags:Random Forest, Fraud Detection, Unbalanced Classification
PDF Full Text Request
Related items