Font Size: a A A

Research Of Classification On Imbalanced Data Sets And Its Application In Student Loans Credit Risk Management

Posted on:2013-02-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q TangFull Text:PDF
GTID:1118330371980968Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
Imbalanced data sets classification is one of the tasks of data mining. The pre-processing and classifier optimization in imbalanced data sets can improve the accuracy of positive class, while traditional classification methods may decrease the accuracy of the classifier. The default risk management of student loans is an imbalanced dataset classification problem, how to effective prevent and control defaultis a crucial task in this field.The research of this dissertation is the classification on imbalanced data sets and its application in student loans credit risk management, the main work is as follow:Classification both in domestic and abroadin this field have been comments, mainly focusing on the problems and some hot-spot that imbalanced classification faces in this field. Meanwhile, the basic differences in both individual credit scoring and imbalanced classification techniques in student loan risk management are compared between domestic and abroad, and we point out the defects and restrictive factors of quantitative researches on student loan risk management in our country.This paper works on the two main areas in imbalanced data classification, pre-process in dataset and algorithm optimization. By analyzing the feature and defect of oversampling, Inspired by attribute selection of wrapper, this paper proposes the wrapper synthetic minority over-sampling technique (Wrapper-SMOTE) to solve the imbalanced classification problem. By experimental tests, we verify that this Wrapper-SMOTE technique outperforms the SMOTE technique and improve the accuracy of positive class.An improved particle swarm optimization which is Genetic Selection Strategy Particle Swarm Optimization is proposed, by utilizing the convergence features of particle swarm to optimize the parameter of support vector machine (SVM). We fouce on the cost and the weight which are the parameters in SVM and find its best fintness, it helps the SVM find its reasonable weight of imbalanced data sets which has different cost between classes, and the separating hyperplane quickly shift to positive class. Then the Experimental results on UCI show the optimized SVM to be effective in improving the fitness of the positive class and its accuracy.This paper discovers the quantitative research methods in student loan default risk management. The data comes from the loan records of 57836 graduate, undergraduate and college students who came from 106 majors in 10 different areas in Wuhan between 2001 and 2008. Compared with traditional methods, an improved result is got by implementing the both Wrapper-SMOTE and GSSPSO-SVM in classifying the risk of this National Student Loan dataset. This study contributes on helping the universities and commercial banks to track and evaluate the default of potential customers and reduce the default rate of the student loan. The research results can improve and direct the National Student Loan policy.
Keywords/Search Tags:Classification, Imbalance Data Sets, Oversampling, Support Vector Machine, Student Loan, Credit Risks Management
PDF Full Text Request
Related items