Font Size: a A A

Large-scale Classification With Incomplete Networks

Posted on:2021-11-09Degree:MasterType:Thesis
Country:ChinaCandidate:S XuFull Text:PDF
GTID:2518306113969419Subject:Statistics
Abstract/Summary:PDF Full Text Request
In recent years,with the change of people's consumption concept and the development of Internet technology,the consumer finance industry has also been booming,with more and more enterprises flooding into this field.However,as consumer finance becomes more widely available,loan fraud is also on the rise.Therefore,in the process of lending,the identification of customer credit is particularly important,which is also an important link in risk control.Traditional risk control uses a large amount of user information as covariates to build statistical models or machine learning models and uses the output of models to determine the level of credit to customers.However,in some cases,covariates used for modeling may contain limited information that is not useful for identifying customers.Sometimes the covariates may be missing at a high rate or completely unavailable,so it is impossible to make an objective judgment on the credit level of some customers.However,the relationship network between customers is often relatively easy to build,such as the use of their application information,social account information,telephone conversation information.So we can consider using the network to identify the good customers.In the study of network data,community detection is a very important problem.The idea of community detection is to divide the whole network into several sub-networks through the connection density of nodes in the network,and the nodes in each sub-network should contain similar attributes.In layman's terms,it is the belief that birds of a feather flock together,and that there should be more bonds between people of the same type,and less bonds between people of different kinds.Based on the hypothesis of community detection and the core idea of semi-supervised learning algorithm based on graph,this paper proposes an algorithm to identify the good and bad customers by using relational network,named as Network Label Propagation algorithm.In practice,the network will be sparse and the influence of loan intermediaries will be ignored if the network is built solely on loan applicants.So,we considers a large network based on loan applicants and all their associated people.However,in those networks,usually only a few customers' label are known,which is because most of them have no repayment record.The Network Label Propagation algorithm only needs to pass the relationship network of all customers to use the real labels of a small number of labeled customers(customers who have already had repayment performance)to predict the real labels of unlabeled customers(rejected customers or unapplied).It can assist consumer finance companies to evaluate personal credit before lending,and achieve the purpose of risk controlUnder some reasonable conditions,this paper proves the convergence of the Network Label Propagation algorithm and it converges to a unique value.At the same time,the consistency of our algorithm has also been proven under the corresponding data generation mechanism.Through the application on simulated data and actual data,this paper verifies that the network label propagation algorithm can achieve good prediction results in various situations,and the application effect is excellent in some cases.
Keywords/Search Tags:Consumer Finance, Risk Control, Network Data, Network Label Propagation Algorithm
PDF Full Text Request
Related items