| Credit business is a simple and popular business among financial institutions.With the rapid development of Internet finance,How to evaluate to effectively reduce this risk becomes more and more important.In actual business scenarios,some credit prediction problems have less labeled data,and the prediction effect of supervised learning is poor.In this paper,a self-taught-learning algorithm is applied to this problem,and an attempt is made to improve the prediction effect on a small number of labeled data with the help of a large amount of unlabeled data.The selftaught-learning algorithm does not require that the labeled and unlabeled data come from the same distribution.This article attempts to use the same distribution and different distributions respectively.This article uses two sets of data as an example for experiments,the data comes from DC competition.One group is credit default data provided by a banking institution.The labeled and unlabeled data obey the same distribution.The other group is the credit data in the actual business scenario provided by Xiamen International Bank.The labeled and unlabeled data obey different distributions.This paper uses a self-taught-learning algorithm.When the self-taught-learning algorithm is directly applied to credit structured data,the prediction effect of the model is not as good as that of the base model(supervised learning).In order to improve the prediction performance of the model,an attempt is made to improve the self-taught-learning algorithm.The first improved model is based on sparse selfencoding of labeled data,and sparse self-encoding of labeled and unlabeled data.When the amount of unlabeled data is fixed,the amount of labeled data is continuously adjusted.After the initial improvement,the prediction effect of the model changes with the amount of labeled data,and the prediction effect is not ideal.After that,the model was further improved.The data obtained by encoding the labeled data was used as the derived data,and the derived data was modeled together with the original data.When the amount of labeled data is small,the prediction performance of the further improved model is better than that of the base model. |