Font Size: a A A

Loss Prediction And Evaluation Of Vehicle Insurance Based On Convolutional Neural Network And XGBoost

Posted on:2024-01-30Degree:MasterType:Thesis
Country:ChinaCandidate:D F XuFull Text:PDF
GTID:2530307088457014Subject:Insurance
Abstract/Summary:PDF Full Text Request
With the continuous improvement of people’s living standards,the number of cars in our country is also increasing,so auto insurance has been paid more and more attention.Auto insurance,as a necessity for purchasing vehicles,occupies half of the business structure of property insurance companies and is of great importance to developing property insurance companies.From a non-life actuarial perspective,the lack of generalization of models predicting claim frequency and the formulation of differentiated premiums have been the weaknesses of insurance companies.In the past,insurance companies often used generalized linear models to fit claim frequency data.However,this method may need more generalization ability in the face of data with higher feature dimensions.With the development of machine learning technology in recent years,the machine learning algorithm can also be applied to the research of predicting claim frequency,and the machine learning algorithm also has its own advantages and disadvantages.The advantage of the machine learning algorithm is that it does not need to make assumptions about the distribution of data in advance,and the resulting model has better performance and generalization ability.The shortcomings of machine learning are also evident.As the process of parameter adjustment is in a "black box" state,the final model obtained is poor in interpretation,and the data imbalance and the selection of super parameters will also affect the final performance of the model.Therefore,this paper uses optimization methods to solve the problem of data imbalance and superparameter selection,then uses different integrated learning algorithms and generalized linear models to predict the claim frequency,and uses the macro average F1 evaluation index to find the most appropriate algorithm to apply to the claim data,in order to provide a reference for insurance companies when pricing.The research content of this paper mainly includes the following four aspects:(1)For the unbalanced data like the insurance claim data set,this paper makes some improvements on the basis of SMOTE and puts forward LK+B-SMOTE with the scholars’ Borderline SMOTE,K-means ++ + clustering algorithm and Tomek Link algorithm.First,divide the few classes into different clusters using K-means++ before up-sampling,then divide the different clusters into noise,boundary and safe clusters according to the Boderline-SMOTE idea and use the interpolation formula to create new sample points in the boundary cluster.Finally,the Tomek Link algorithm is used to delete the sample points that may affect the definition of the boundary.This improved method not only solves the blindness of SMOTE but also increases boundary clarity between the different samples.(2)For the super parameter determination of the ensemble learning algorithm,this paper makes some improvements on the basis of the quantum particle swarm optimization algorithm.In this paper,the change value of self-fitness under iteration of the model is taken as a reference,and the idea of a simulated annealing algorithm is adopted to optimize the expansion-contraction coefficient.Instead the traditional linear decline expansion-contraction coefficient is adjusted according to the actual situation of the model.The evolution factor is also introduced in this paper so that the quantum particle swarm optimization algorithm can no longer converge to the local optimal solution and find the global optimal solution.Finally,the improved algorithm is applied to the simulation function,and the results show that the optimization accuracy of the improved DSAPSO algorithm is obviously better than that of PSO algorithm and QPSO algorithm.(3)The generalized linear model and the integrated learning algorithm model are respectively applied to the insurance claim data,and the superiority of the optimization algorithm is verified by using different data imbalance means and hyperparameter selection methods.The performance of random forest,XGBoost and CatBoost models is compared and screened.The results show that the improved unbalanced data processing method,the improved hyperparameter selection method and CatBoost algorithm can achieve the best results.Finally,the model stacking method is used to fuse the generalized linear model with the ensemble learning algorithm,which improves the performance of the model while ensuring the generalization ability.(4)In this paper,the convolutional neural network was improved by combining the network framework of Alex-Net,random downsampling algorithm and Relu activation function for the judgment of loss categories.According to the actual task requirements of this paper,the number of layers,the size of convolutional kernel,the number of convolutional kernel and other superparameters of the convolutional neural network were manually adjusted.Finally,DropConnect,Batch and other optimization algorithms are used to further improve the classification performance and training speed of the convolutional neural network model.
Keywords/Search Tags:Frequency of claims, Integrated learning, Data imbalance, SMOTE, Quantum particle swarm optimization, Convolutional Neural Networks
PDF Full Text Request
Related items