| With the rapid development of China’s automotive industry,automotive finance has become an indispensable part of the industry chain,and more and more consumers choose to purchase cars through automotive finance loans.However,while the automotive finance industry is developing rapidly,there are also some problems.The evaluation methods of loan customers by automotive finance institutions are too subjective,lacking scientific basis,and the default rate of loan customers is relatively high.In response to the above issues,this article adopts machine learning to construct a prediction model to improve the judgment of the likelihood of credit customer default,help relevant personnel of financial companies identify potential defaulting customers,and maximize the avoidance of credit risk.First,descriptive statistics are carried out on the source data(the relevant borrower information of an auto lending institution),and data preprocessing is carried out on the source data,mainly to eliminate outlier,delete redundant features,and discretization continuous values.Because the source data is unbalanced,the paper uses ADASYN method of oversampling method to process the source data unbalanced.Apply feature engineering to the processed data,increase the number of features to 52 through feature derivation,and then filter through feature selection to form a dataset consisting of 32 most valuable features.Secondly,random forest,LightGBM and XGBoost in the integrated learning algorithm are selected to construct the prediction model for the data set,and grid search and genetic algorithm optimization are carried out for them respectively.The experimental results show that the model optimized by genetic algorithm has significantly improved prediction performance compared to the model optimized by grid search,with XGBoost optimized by genetic algorithm having the best prediction performance.Comparing the three ensemble learning models with other non ensemble learning models,the results show that the predictive performance of ensemble learning models is much higher than that of non ensemble learning models.Finally,the optimized model was fused using the GA fusion method based on genetic algorithm,as well as the voting method,Stacking method,and Blending method proposed in this paper.The experimental results showed that the prediction performance of the fused model was improved compared to a single model.Among them,XGBoost,LightGBM and random forest optimized by genetic algorithm are used as basic learners,and decision tree is used as meta learner.The model after Stacking fusion has the best prediction effect.Compared with the single model,the accuracy rate,F1 score and AUC value are all improved by more than 0.01,effectively improving the judgment of loan customers’ default probability.Therefore,the Stacking fusion model was selected as the final automobile loan default prediction model,and the feature importance ranking of the model was visualized to enhance its interpretability and achieve effective prediction of whether automobile loan customers have defaulted.And using the Django framework to establish an automobile loan default prediction system,the automobile loan default prediction model can be applied to automobile financial institutions to improve their efficiency and reduce the workload of staff. |