Font Size: a A A

Research And Application Of Machine Learning Technology For Risk Prediction Of Critical Diseases Claims Fraud

Posted on:2020-10-01Degree:MasterType:Thesis
Country:ChinaCandidate:F WuFull Text:PDF
GTID:2518306500987029Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Nowadays,the rapid development of the insurance industry,insurance fraud has become one of the important factors hindering the development.Critical illness insurance,a critical product in the insurance market currently,has always been the worst-hit area of??insurance fraud.The traditional insurance fraud detection method,which has two disadvantages including high labor cost and slow update speed,is not suitable for today’s rapid development of business.The use of machine learning methods to assist in the fraud detection of critical illness insurance can alleviate the above problems.Based on the claims data of a China life insurance company,this paper uses a variety of data processing methods to process the data,and finally uses the random forest algorithm for predictive modeling.Through the research of claims cases,this paper firstly summaries five main aspects related to the fraud,including claims,policies,customers,agents,regions and so on.Based on these five aspects,all the data fields related were captured in this life insurance company database,and more than 40 features were filtered by using the chi-square test and the ANOVA method.After data preprocessing of these 40 features,this paper establishes a prediction model using random forest.Through the exploration of the data,this paper finds that due to the limited ability of fraud detection in several regions,the noise content in the train data will be different.In order to identify its noise content,this paper uses linear regression to fit the proportion of fraud of all regions,and the fitting result is the expected proportion of fraud.If the proportion of fraud in the region is lower than the expected proportion of fraud,it is considered to be noisy.Finally,the goal of optimizing the training data set is achieved by eliminating the noise of data.The optimized model improves the area under the ROC curve by about 2 percentage points in comparison with the model before optimization.However,the large model after optimization will reduce the scope of application to a certain extent.After reducing the modeling,the paper also compares the difference between several different processing methods and different algorithms,and finally summarizes some business recommendations that may be used in the actual business according to the features importance of the random forest.The research shows that the optimized model improves the AUC of the ROC curve by about 2 percentage points compared with the model before optimization.However,the model after optimization will reduce the scope of application to a certain extent.Besides,this paper also compares the difference between the different processing methods and different algorithms,and finally summarizes the relevant business recommendations that may be used in the actual business according to the features importance of random forest.
Keywords/Search Tags:Claims Fraud, Random Forest, Machine Learning, Trainset Optimization
PDF Full Text Request
Related items