Default Risk Prediction And Credit Score Of Housing Loans

Posted on:2024-05-08

Degree:Master

Type:Thesis

Country:China

Candidate:Z F Lin

Full Text:PDF

GTID:2530307181953819

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

The real estate industry occupies an important position in China’s economic system.At the end of 2021,the GDP of the real estate industry in the country accounted for6.78% of GDP.At present,some down payment loan products have appeared in China’s market,and even buyers who do not have the ability to repay can apply for loans from banks,so there often defaults.Therefore,the risk management of personal housing loans has become one of the important risks that many banks pay attention to and prevent.This article analyses housing loan data.Firstly,data preprocessing,feature engineering and data exploration are carried on the data set.Secondly,the variables are selected and the default risk prediction model are established.Finally,the optimal model is used to build the score card.The specific contents are as follows:(1)Data processing aspect.Firstly,data cleaning are performed,mainly for missing values and outliers in the data.For variables with missing rate greater than 30%,they are deleted;for variables with missing rate less than 30%,numerical variables are filled with median,and subtype variables are filled with mode.In this paper,3σ criteria are used to determine whether the data is outlier,and if the percentage of outlier is less than 1%,they are processed according to the method of missing values,and the rest are not processed.Next,feature engineering is carried out,mainly for variable derivation,data set division,feature sub-boxing,WOE value and IV value calculation,and the data imbalance problem is handled using a combination of oversampling and undersampling methods.(2)Data exploration aspect.This section uses descriptive statistics and histograms for data exploration and analysis.In terms of basic customers information,customers in the range of(20,25] have a higher default rate,and the default rate gradually decreases with age.female customers have a higher risk of default,and customers with low education have a higher default rate.In terms of clients’ family circumstances,unmarried clients have higher default rates and widowed clients have the lowest default rates.In terms of customers’ work,customers with short working years and low income have a higher default rate.(3)Feature selection.In this paper,the random forest model,logical regression and XGBoost model are used to obtain the comprehensive score of each variable combined with TOPSIS comprehensive evaluation method to sort,and a feature ranking table is obtained.Combining with the effect of the subsequent models,the number of features is determined,and finally 20 variables such as external standard score＿3 and age are selected.(4)Establishment of default risk prediction model and credit scorecard.A variety of single default risk prediction models are constructed on the housing loan data,and the test set is used to evaluate the models.Each single model is used as the primary classifier of the fusion model and logistic regression as the secondary classifier.Secondly,all the established models are compared,and it is found that the fusion model has the best effect,and the recall rates for the non-default samples and the defaulted samples are 0.9697 and0.9269,respectively,and the AUC value of the model reaches 0.98 and the KS value is0.3933.Finally,credit score cards for housing loan customers are constructed based on the fusion model.According to the results obtained by the credit score card,all customers with a score of less than 370 points are defaulting customers.The score ranges from 420 to 480,and the default rate drops sharply.The group with a larger score shows a smaller default ratio,and none of the customers with a score of 702 or more defaults.

Keywords/Search Tags:

Default Risk of Housing Loan, TOPSIS, Machine Learning, Fusion Model, Credit Score Card

PDF Full Text Request

Related items

1	Research On The Prediction Of Personal Credit Loan Default Risk Based On Ensemble Learning
2	Application Of Model Fusion In Monitoring Loan Default Risk
3	Application And Comparison Of Machine Learning Algorithms In Credit Card Fraud Identification
4	A Study On Credit Default Prediction Based On Supervised Learning
5	Research On Credit Evaluation Of Bank Personal Credit Loan Based On SMOTE-Logistic Regression Algorithm
6	Construction And Application Of P2P Loan Default Prediction Model Based On Stacking
7	Research On Optimization Of Personal Credit Default Probability Estimation
8	Identification Of Enterprise Loan Default Risk Based On Machine Learning
9	Prediction And Optimization Of Default Risk Of Chinese Bonds Based On Light Gbm
10	The Prediction Of Personal Loan Default Based On LightGBM Model