Font Size: a A A

Data Compression And Complement Approaches In Factorization Machine Based Credit Prediction

Posted on:2019-06-14Degree:MasterType:Thesis
Country:ChinaCandidate:G ZhuFull Text:PDF
GTID:2428330596966404Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The prediction of enterprise and personal credit is of great significance in the field of financial loan.The credit types of debtors are predicted by credit prediction model to decide whether they can get the loan,which can reduce the loan risk of banks and other financial sectors to a large extent.Small and micro enterprises are an important part of China's enterprises,howerer,they usually have difficulty in getting loans from banks.With the rapid development of Internet,more and more information is left on the Internet by small and micro enterprises.Using Internet data to predict the credit of small and micro businesses will provide inspiration for small and micro businesses credit loans.There are usually some interactions between features of Internet data,and Factorization Machine has some advantages in dealing with this type of data,so the application of Factorization Machine in credit prediction is studied in this thesis.As there is repeating data and missing values in credit data,using some approaches to remove repeating data will improve the efficiency of credit prediction,and complementing missing values will improve the quality of credit prediction.The main research work of this thesis is as follows:(1)The application of the Factorization Machine model in the area of credit prediction is studied.This thesis makes an in-depth study of Factorization Machine,analyzes its principles and advantages in detail,and applies it to the area of credit prediction.Some credit prediction experiments are carried out on four enterprise credit data sets and the performance of Factorization Machine model is compared with the commonly used classification algorithms.The experimental results show that the performance of the Factorization Machine on multiple evaluation indexes is good,and Factorization Machine is suitable for credit data sets.(2)To deal with repeating data in credit data sets,an algorithm named STH-ML which is based on Block Structure(BS)and Hash Learning is proposed,which is mainly used to generate mapping files used by BS,and cooperated with BS to achieve the purpose of compressing data set size.Credit data is related data,which can cause repeating data to some extent.Factorization Machine has proposed the idea of using Block Structure to avoid repeating data,thus compressing the data size.However,there are few methods in the key process for generating mapping files.Based on the SelfTaught Hashing(STH)algorithm,this thesis proposes BS based algorithm to generate mapping files.The experimental results show that the performance of STH-ML is improved by 6.78% at most.(3)To deal with missing values in credit data sets,a labels based multi view fusion method for data complement is proposed.There are missing values in some credit data,and usually these data with missing values still have certain research values,and complementing the missing values will improve the quality of credit prediction.Using the existing sample's credit label and drawing on the idea of collaborative filtering in recommender system,a labels based multi view fusion meathod named LMVFM is proposed in this thesis.The experimental results show that compared with the classical data complementation algorithms,the complementing error of LMVFM is reduced by 4.13% at most,which can be well applied to the enterprise and personal credit data set.
Keywords/Search Tags:Factorization Machine, Credit Prediction, Block Structure, Data Complement, Hash Learning
PDF Full Text Request
Related items