Font Size: a A A

Study On The Frequency Of Auto Insurance Claims Based On Ensemble Learning Algorithm

Posted on:2021-06-22Degree:MasterType:Thesis
Country:ChinaCandidate:C C HanFull Text:PDF
GTID:2518306113955959Subject:Master of Insurance
Abstract/Summary:PDF Full Text Request
The business structure of China's property insurance company determines that the automobile insurance business has always occupied a very important position in the property insurance company.According to the statistics of Bancassurance Regulatory Commission,in 2018,the automobile insurance premiums of 55 property insurance companies exceeded 100 million,with a total of about 779.9 billion,while the total underwriting profit was only 2.42 billion,a year-on-year decrease of 71%.Explore the main reasons for the decline of profits,mainly reflected in three aspects.First of all,with the exit of the preferential policy for car purchase and the adjustment of the fee reform policy,the car insurance market ushered in the expected downturn;secondly,the fierce market competition led to the high amount of the operating cost of the car insurance business;finally,at the technical level,the result of the determination of the car insurance rate was not reasonable,which led to the imbalance of the matching relationship between the premium and risk.The auto insurance market has fallen into a state of "hot water",which increasingly encourages enterprises to improve their independent pricing ability and realize stable operation.The pricing model of automobile insurance includes claim frequency model,claim amount model and cumulative loss model.In practical application,it is necessary to select different model combinations according to the characteristics of data to determine the rate of automobile insurance.In recent years,machine learning algorithm is widely used as a prediction tool.In many fields,because of the use of machine learning algorithm,the results of complex problems are more accurate and the accuracy of problem solving is improved.In the prediction of insurance claims,machine learning algorithm has also become a better way to solve the problem.As a conventional data processing method,in the face of large-scale sample data,the generalized linear regression model is inferior to the machine learning algorithm in terms of goodness of fit,and the larger the sample data,the more obvious the superiority of the machine learning algorithm.The advantages and disadvantages of the two models are that the machine learning algorithm does not depend on the distribution assumption,which can improve the accuracy of insurance loss prediction to a certain extent.The disadvantages are time-consuming,more human intervention in the modeling process,higher requirements for users in modeling difficulty,and the interpretability of the output results is not as good as the generalized linear model.When the real insurance claim data can not reach the data distribution state required by the generalized linear model,machine learning algorithm can be a solution to this kind of problem.We can not only use the machine learning algorithm,but also compare the modeling results of the machine learning algorithm with the modeling results of the generalized linear model to help verify whether there is any error in the prediction of claim data.When using machine learning algorithm to predict the real claim data,the data obtained is not consistent with the actual distribution state,and the number of units of risk used is not the same,which will cause problems in the modeling process of machine learning algorithm.Under the background of big data,auto insurance will be subverted,and data connectivity and algorithm optimization will lead to the rewriting of auto insurance pricing logic.Therefore,this paper puts forward effective solutions to the above problems.Through the analysis of the actual verification results of automobile insurance claim data,combined with XGBoost(extremum gradient decision tree)method,it can effectively improve the pricing accuracy and provide a new reference standard for the industry.In a word,with the continuous development of the Internet of vehicles technology and the deepening of the reform of vehicle insurance rates,machine learning method can make the determination of rates more reasonable and fair,and China's vehicle insurance market will continue to standardize.In this paper,the original data set of commercial vehicle loss insurance of a property and Casualty Insurance Co.,Ltd.in China is used to establish the prediction model of accident frequency through GLM,regression tree,GBDT and XGBoost.The more accurate prediction model is evaluated through the comparison and optimization of the model,and based on XGBoost model The results of the prediction rank the relative importance of risk factors in the characteristics of the model,and provide reference for the underwriting,pricing and claims settlement of automobile insurance.The paper consists of five parts,which support each other and implement the logic of the application of auto insurance pricing model under the background of machine learning:The first chapter briefly introduces the background and significance of this paper,through the analysis of the domestic and foreign related classic literature and frontier research,in-depth understanding of the current research status on the issue of improving the frequency of claims,and puts forward the research ideas and chapter arrangements of this paper.The second chapter analyzes the important thinking points of the loss frequency prediction of motor vehicle insurance,mainly including the basic model form,the expression of loss function,the evaluation criteria of the performance of models between the same model and different characteristic models,so as to provide preparation for the subsequent comparison and analysis.In the third chapter,after confirming the prediction target of claim frequency,in order to select the following models,we first have a deep understanding of the data set used in this paper.In this paper,we do not use the previous foreign related vehicle insurance data sets,combined with the domestic big data background and the development of the insurance industry,choose a Property Insurance Co.,Ltd.vehicle loss value insurance data.Through the cleaning of data sets,descriptive statistical analysis of variables,correlation analysis to understand the characteristics of data.Different methods are used to deal with the imbalance of the number of claims,so as to improve the prediction level.In the fourth chapter,this chapter is the core part of the paper.The training set and the verification set of the sample set are divided.At the same time,the in sample loss and out of sample loss of each model are compared,and the prediction accuracy and universality of the comprehensive model are evaluated.Based on the generalized linear regression model of the traditional statistical model based on the mean value,this paper compares the lifting tree model and XGBoost algorithm model based on the lifting algorithm,and makes theoretical introduction,feature preprocessing,parameter selection,model optimization,prediction ability analysis and comparison according to the characteristics of each model,so as to compare the advantages and disadvantages of the model.In the process of model prediction,the manual intervention should be reduced as much as possible,and data-driven should be adhered to.In the generalized linear regression,the classification standard of numerical variables uses regression tree to make decisions,and the model effect is better compared with the variable classification of industry experts' opinions.In the fifth chapter,through to the fourth chapter of various evaluation index model of feedback summary,analysis the advantages and disadvantages of each model has application scenarios,summarizes the conclusion of model selection,in view of the motor vehicle insurance claim frequency forecast pricing technical problems,can further perfect the relevant contents,including the large data under the background of comprehensive,improve data dimension model to further improve combination forecasting accuracy,improve ability with the model under the machine learning algorithms,etc.,to further expand the article.
Keywords/Search Tags:Claim frequency, Generalized Linear Regression, Decision tree, Ensemble learning
PDF Full Text Request
Related items