Font Size: a A A

Using Machine Learning Techniques To Compare Different Resampling Methods In Predicting Insurance Claims Occurrence

Posted on:2022-12-11Degree:MasterType:Thesis
Country:ChinaCandidate:Mohamed Hanafy Kotb IbrahimFull Text:PDF
GTID:2518306731994769Subject:Financial statistics, risk management and actuarial science
Abstract/Summary:PDF Full Text Request
Predicting the frequency of insurance claims has become a significant challenge due to the imbalanced datasets since the number of occurring claims is usually Significantly lower than the number of non-occurring claims.As a result,classification models tend to have a limited ability to predict the occurrence of claims.So,in this thesis,we'll use various Resampling Methods to try to solve the imbalanced data problem in the insurance industry.We developed 84 machine learning models for predicting insurance claims occurrence((under-sampling,over-sampling,the combination of over-and under-sampling(hybrid),and SMOTE)×(logistic regression,K-nearest neighbors,na(?)ve bayes,three Decision tree models,three boosting models,two bagging models,SVM,4 neural network,and 5 deep neural network)= 84 models),and we compared the models' accuracies,sensitivities,and specificities to comprehend the prediction performance of the built models.The dataset contains 81628 claims,each of which is a car insurance claim.There were5714 claims that occurred and 75914 claims that didn't occur.According to the findings,the Ada Boost classifier with oversampling and the hybrid method had the most accurate predictions,with a sensitivity of 92.94%,a specificity of 99.82%,and an accuracy of 99.4%.And with a sensitivity of 92.48%,a specificity of 99.63%,and an accuracy of 99.1%,respectively.This thesis confirmed that when analyzing imbalanced data,the Ada Boost classifier,whether using oversampling or the hybrid process,could generate more accurate models than other models that used in this study.And in general,when we applied machine learning models on the different balanced data created by different resampling methods,we can notice that all ML classifiers' results have improved,and all classes can be predicted,indicating that the classifiers' performance is satisfactory.
Keywords/Search Tags:Automobile insurance, Insurance claims, Classification, Machine learning, Imbalanced data, Resampling methods, Statistical analysis
PDF Full Text Request
Related items