Font Size: a A A

Insurance Cross-selling Prediction Based On Imbalanced Data

Posted on:2022-12-21Degree:MasterType:Thesis
Country:ChinaCandidate:S J WangFull Text:PDF
GTID:2518306611996339Subject:Insurance
Abstract/Summary:PDF Full Text Request
With the vigorous development of the national economy,people's living standards have achieved a qualitative leap,so they pay more attention to the safety of their own property,and insurance has become an important means of asset protection for residents.However,with the increasing competition in the market,the traditional insurance industry is facing the difficulties in acquiring new customers and losing old ones,so the digital transformation of marketing has become an inevitable trend.In recent years,information technology and data mining technology constantly improve,for enterprises to fully understand customer information,dig potential customers purchasing power,implementing cross-selling precision marketing.Based on the existing real customer data of insurance companies,this paper classifies and predicts the customers who have purchased the company's health insurance through data mining.To identify potential customers who may be interested in the company's auto insurance and improve the success rate of cross-selling.Since the target customers usually account for only a small portion of the total customer base,data imbalance is a key concern in this paper.This paper first explores the potential laws between variables through data visualization analysis,and displays the relationship between different variables and whether users choose to buy auto insurance through graphs,then choose five resampling methods to construct a balanced dataset based on the data level,and finally choose four algorithms : logistic regression,decision tree,Adaboost and Adacost to model the data.The results show that(1)at the data resampling level,SMOTETomek hybrid sampling is more effective than the other four methods of undersampling and oversampling in classifying each model;(2)at the data metric level,logistic regression has the best performance in model recall,with 97.5% identification of few samples.The Adaboost model is more suitable than other models as a classification model for unbalanced data,with an accuracy of 88% and an AUC of 0.96,which can basically identify potential auto insurance customers of this insurance company.
Keywords/Search Tags:Imbalance data, Cross-selling, Resampling, Adaboost, Decision tree
PDF Full Text Request
Related items