| With the continuous development of Internet credit business,important application problems in current credit business have become the effective evaluation of individual loan repayment ability and the appropriate reduction of the potential credit risk.Starting from the extraction of the main features that affect the repayment behavior of loan users,the relevant platform data sets are used for training and analysis of the repayment model,and then the corresponding combined prediction model is formed to give certain suggestions to reduce credit risk.In this thesis,the data set obtained from relevant loan user is used for preprocessing.Firstly,the data set is filled with default values,user deduplication,and normalization.The most relevant value,mean value,and counting are used to generate certain characteristics of the users.Then,we integrate the characteristics of these users into related data tables.The synthetic oversampling technology is used to solve the problem of non-equilibrium distribution of loan data and to avoid overfitting.Some features with greater correlation are removed by using the degree of correlation between features.According to the weight of features,some features with lower weight coefficients are deleted to extract the main features that affect the loan repayment behavior.The XGBoost algorithm is used to predict the repayment behavior of loan users.The accuracy and AUC and KS values obtained from numerical experiments are higher than those obtained from other classification algorithms such as random forest and adaptive enhancement.The applicability of XGBoost algorithm in the classification of loan users and repayment behavior has been discussed and demonstrated.In order to improve the accuracy and stability of the model using XGBoost algorithm,the best parameters used in the algorithm is obtained through cross-validation by grid search technology.By using the numerical experiments,the prediction results of the model before and after tuning with XGBoost algorithm are compared.Our compared results verified that the model has good prediction performance after tuning.After pre-processing,correlation analysis and feature screening of the user data set,the combined prediction model of CNN and XGBoost algorithm is used to predict the risk of loan users and repayment behavior.Comparing with a single XGBoost algorithm,it is concluded that the combined model has a higher prediction accuracy,thus demonstrating the effectiveness of the combined model in predicting the loan risk of loan users. |