Font Size: a A A

Research On Fraud Identification Of Vehicle Insurance Based On Machine Learning

Posted on:2022-05-12Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ChuFull Text:PDF
GTID:2518306311468894Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
For a long time,motor vehicle insurance is the largest business of property insurance in China.However,statistics show that about 20%of auto insurance claims contain the possibility of fraud,and less than 3%of suspected fraud are prosecuted.Under the background that the auto insurance reform reduces the premium income,insurance companies can maintain their benign operation and increase their competitiveness by reducing insurance expenditure.It is of great significance to improve the identification rate of insurance fraud and accurately crack down on it for reducing the cost of insurance premium.Compared with manual identification,the identification of insurance fraud by using machine learning model has the advantages of more cost saving and higher accuracy.This paper attempts to introduce a single statistical model and machine learning model into the identification of auto insurance fraud,and uses Stacking technology to integrate multiple models,and obtains a more stable model with higher prediction accuracy.Firstly,this paper summarizes the basic concepts of machine learning and the basic theory of the model used in this paper.Next,data preprocessing methods such as data cleaning,data transformation and feature selection are applied to the Kaggle public data set,which can reduce the running time and improve the prediction accuracy for the subsequent establishment of the fraud identification model.Secondly,taking the preprocessed data as the input,the insurance fraud identification models based on Naive Bayes,SVM,Adaboost and KNN models are constructed respectively,and the parameters are adjusted to obtain better performance.Using the evaluation method of classification model to compare and analyze the prediction results,among which KNN has the best classification effect for the whole,while naive Bayes has the worst prediction ability on the whole,but its prediction for fraud samples is the best among the four models.Finally,the paper introduces the basic flow of Stacking technology and uses Stacking technology to fuse the four models to get a new auto insurance fraud identification model.This model combines the advantages of high stability of statistical model and high prediction accuracy of machine learning model.and its prediction accuracy for the whole sample and fraud samples is significantly improved compared with that of a single model.
Keywords/Search Tags:Machine learning, Adaboost, Stacking, Feature selection
PDF Full Text Request
Related items