Font Size: a A A

Research On User Identification Of SMS Fraud Based On SPARK And Random Forest

Posted on:2019-11-16Degree:MasterType:Thesis
Country:ChinaCandidate:Q Y YueFull Text:PDF
GTID:2438330563957613Subject:Instrumentation engineering
Abstract/Summary:PDF Full Text Request
Along With the rapid development of information technology and the advent of the data era,the phenomenon of fraud in telecommunications is increasing.Fraud tactics are emerging in an endless stream.The fraud rate has been significantly increased,which seriously jeopardizes people's lives and property safety.Among them,SMS fraud is the longest-lasting and full of tricks.One of the means is hard to put an end to.Therefore,it is very important to determine the identity of criminals before they commit fraud,and it is very important to act in a timely manner if problems arise.This not only prevents criminals from committing crimes in a timely manner,but also prevents people's lives and property from being harmed.Public security organs and other judicial organs provided new ideas for detection.The telecommunications industry has a long development time and a large amount of business data.It considers the characteristics of frequent update iterations when building a model.The distributed computing platform SPARK with memory caching and iterative computing advantages and the parallelized random forest algorithm are selected and parallelized to achieve the effect of reducing the overall model runtime.For the problem that the category imbalance of SMS fraud user data affects the classification accuracy of random forests,a method of stratified sampling to generate feature subspaces is proposed.In order to avoid the disadvantages of traditional random forest algorithms,a weighted idea is adopted,according to out-of-pocket data.Evaluate the performance of each decision tree,determine the weight of the decision tree,and finally obtain the classification results.A mining model based on SPARK and Hierarchical Subspace Weighted Random Forest Algorithm was designed to meet the business needs of SMS fraud user identification and the characteristics of industry data.The model application results show that the method of stratified sampling to generate subspace effectively solves the problem of class not Under the condition of equilibrium,the accuracy of the random forest is reduced.The parallel platform reduces the training time and testing time of the model and improves the efficiency.Compared with other classification algorithms,the performance is more outstanding,the parallel efficiency is more prominent,and telecommunications SMS fraud is realized.The accuracy of the user's recognition is as high as 90%.
Keywords/Search Tags:SPARK, Random forests, Layered subspace, Weighted, SMS fraud user identification
PDF Full Text Request
Related items