Using Machine Learning Techniques To Compare Different Resampling Methods In Predicting Insurance Claims Occurrence

Posted on:2022-12-11

Degree:Master

Type:Thesis

Institution:University

Candidate:Mohamed Hanafy Kotb Ibrahim

Full Text:PDF

GTID:2518306731994769

Subject:Financial statistics, risk management and actuarial science

Abstract/Summary:

PDF Full Text Request

Predicting the frequency of insurance claims has become a significant challenge due to the imbalanced datasets since the number of occurring claims is usually Significantly lower than the number of non-occurring claims.As a result,classification models tend to have a limited ability to predict the occurrence of claims.So,in this thesis,we’ll use various Resampling Methods to try to solve the imbalanced data problem in the insurance industry.We developed 84 machine learning models for predicting insurance claims occurrence((under-sampling,over-sampling,the combination of over-and under-sampling(hybrid),and SMOTE)×(logistic regression,K-nearest neighbors,na(?)ve bayes,three Decision tree models,three boosting models,two bagging models,SVM,4 neural network,and 5 deep neural network)= 84 models),and we compared the models’ accuracies,sensitivities,and specificities to comprehend the prediction performance of the built models.The dataset contains 81628 claims,each of which is a car insurance claim.There were5714 claims that occurred and 75914 claims that didn’t occur.According to the findings,the Ada Boost classifier with oversampling and the hybrid method had the most accurate predictions,with a sensitivity of 92.94%,a specificity of 99.82%,and an accuracy of 99.4%.And with a sensitivity of 92.48%,a specificity of 99.63%,and an accuracy of 99.1%,respectively.This thesis confirmed that when analyzing imbalanced data,the Ada Boost classifier,whether using oversampling or the hybrid process,could generate more accurate models than other models that used in this study.And in general,when we applied machine learning models on the different balanced data created by different resampling methods,we can notice that all ML classifiers’ results have improved,and all classes can be predicted,indicating that the classifiers’ performance is satisfactory.

Keywords/Search Tags:

Automobile insurance, Insurance claims, Classification, Machine learning, Imbalanced data, Resampling methods, Statistical analysis

PDF Full Text Request

Related items

1	Research On Automobile Insurance Fraud Imbalanced Classification Based On Sampling Technology
2	Research On The Identification And Prevention And Control Of Claims Risk Of Accidental Injury Insurance
3	Design And Implementation Of The Insurance Claims Internet Query System In Anhua Agricultural Insurance Company
4	Research And Analysis On Forest Insurance Claim System Of Yong'an Insurance Company
5	Research About Anti-Fraud Detection Of Vehicle Insurance Claims Based On Data Mining Technology
6	Car Insurance Claims Based On 3 G Network Mobile Video Survey
7	The Design And Implementation Of Travel Insurance Claims Work Flow System For Insurance Broker Company
8	Analysis And Design Of Auto Insurance Claims System Of Taizhou Branch Of China People 's Property Insurance Co., Ltd
9	Design And Implementation Of On Auto Insurance Claims System For Sunshine Company
10	Analysis And Design Of Auto Insurance Claims System