| With the vigorous development of China’s economy,credit products are gradually popularized,and credit risk factors are more complicated.The rapid development of the automobile industry is inseparable from the support of the financial industry.Installment payment has become the mainstream way of purchasing cars.In recent years,various automotive suppliers have increased subsidies for car purchases in order to seize market share in new energy vehicles.At the same time,in order to achieve the goal of carbon neutrality,the government of China has increased support for new energy vehicles in terms of funds,technology,and talents,and encouraged people to purchase new energy vehicles.Affected by multiple factors,consumers’ desire to purchase new energy vehicles has reached an unprecedented height.However,credit problems such as high bad loan rates and lack of risk management and control technology continue to emerge.The contradiction between inefficient traditional credit evaluation and risk control models and the growing demand for automotive credit will become a key factor restricting the development of the auto industry.The method of machine learning can effectively solve the above contradictions.Machine learning relies on data mining technology to train,model,analyze,and predict consumer personal information in a very short period of time.Based on business understanding,this thesis adopts efficient machine learning methods to deeply mine customer information,and horizontally compares the prediction of different models.The viewpoint of selecting models on demand is proposed: we should not mechanically select a model as the core model of the car loan user default risk identification system,but should choose the most suitable model to meet the needs of financial institutions.Therefore,this research has the practical significance in helping financial institutions to accurately identify default users and the theoretical significance in enriching the theory of credit loan risk management in China.Chapter 2 gives a detailed introduction to the principles of the selected machine learning models,such as Logistic Regression,Random Forest,Ada Boost,Light GBM models.Chapter 3 introduces the data source and preparation work,including data cleaning,feature extraction,visualization processing,training set balancing.In Chapter4,the balanced training set data is used to build the models,and the test set data is used to evaluate the predictive performance of the models.Finally,the models are compared based on evaluation metrics.The Light GBM has the highest accuracy rate of 84.16%,but the logistic regression model has the lowest accuracy rate of only 76.97%.Ada Boost has the largest AUC value,followed by the optimized logistic regression model,which are 0.7549 and 0.7411 respectively,and the difference between the two is not much.In the case of only considering the AUC value,the logistic regression model is better than Ada Boost model.Because the gap between the classification effects of the two is not very large,compared with Ada Boost model,the logistic regression model has higher operating efficiency and stronger interpretability.The Ada Boost model has the largest F1 value of 0.614 among the five models,and the F1 value of the Light GBM model is0.610.The difference between the two is very small,but the operating efficiency of the Ada Boost is inferior to that of the Light GBM.On the whole,the Light GBM is better than the Ada Boost model.In general,the above models have their own advantages and disadvantages,and the appropriate model should be selected according to the development strategy of the financial institutions.The Light GBM model is suitable for distinguishing default users from non-default users with high accuracy,while minimizing the misjudgment of non-default users as default users.The Light GBM model is suitable for the business expansion stage of financial institutions,and adopts a looser risk control strategy to increase the revenue.The optimized logistic regression model is suitable for satisfying the need to classify default users less as non-default users.The optimized logistic regression model is suitable for the business contraction stage of financial institutions,and adopts stricter risk control measures to reduce the bad debt rate. |