| Due to the change of consuming habit,people start to rely on credit loans to meet their daily needs.For this reason,a large number of online credit platforms have emerged in society.These online loans are characterized by simple procedures and fast issuance.Moreover,online credit platforms have a wider coverage than traditional financial institutions,so they are very popular today.However,information asymmetry is a key issue in this emerging market.Surveys show that the loan default rate of online credit platforms is much higher than that of traditional banks.Therefore,it is very necessary for investors and financial institutions to make accurate risk predictions for credit loans.Based on the study of credit default risk,in this thesis we combine machine learning and big data technology to construct an intelligent credit risk evaluation system.The main work of this paper is as follows:(1)By studying the sequential backward feature selection algorithm,we propose a sequential backward feature selection method based on feature contribution.We use this method to filter the features through the built-in stopping strategy,and finally outputs a subset of features,which provides a stable and powerful feature selection method for credit prediction.(2)Aiming at solving the problem of unbalanced classification of credit data,and based on the research on undersampling method,we propose a grouping equalization method.We apply this method to use the prediction probability of the data as the grouping standard,which can improve the prediction effect of the classification model while solving the problem of sample imbalance.(3)Based on the above two methods,we combine sequential feature selection and grouping equalization undersampling method to form a two-level model which can not only provide robust and high-quality features for credit default prediction models,but also obtain a balanced dataset of credit defaults.By comparing with the original feature set,the combined two-level model is beneficial to improve the prediction performance of the credit default model. |