Font Size: a A A

Second-hand Car Financial User Portrait System Based On The Combination Model Of Gradient Boosting Decision Tree

Posted on:2021-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:H N PanFull Text:PDF
GTID:2428330602477693Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Online second-hand car trading started late in China,and the domestic credit system still needs to be improved.At present,the financial users of second-hand cars in China mainly use the rule model system for screening,and judge the level of financial intention of the users by making phone calls manually and browsing the contents recently.This model has a small coverage,rigid application conditions,low service efficiency and cannot adapt to more and more frequent business changes.Therefore,a more complete and efficient system is needed.In view of the current situation,this thesis adopts the Gradient Boosting Decision Tree(GBDT)which with simple model structure,good nonlinear effect and high explanatory power,combine the Logistic Regression(LR)model which with simple logic and fast execution.By analyzing and mining the browsing log data of users in the recent period of the platform,the probability of financial transaction of users in the next week is predicted.More efficient,more comprehensive service second-hand car users The main work of this thesis is as follows:(1)mass data processing and efficiency optimization:use Spark distributed cluster for data processing,improve data processing speed,and avoid local memory consumption.Using wide and narrow table conversion,time interval flag,Spark resident memory policy to improve processing performance;(2)unbalanced data processing:a new sample expansion method based on business logic is proposed to expand rare positive samples.For negative samples,random descending sampling is adopted to balance the proportion of positive and negative samples;(3)feature engineering:use time window and time weighted processing to enhance the dimension of features.Construct multi-dimensional financial characteristics of users,and make statistics on the distribution of the price range and the latest purchase time of users in the past 7 days;(4)modeling:GBDT+LR model is used for prediction,GBDT is used for preliminary modeling,continuous values are discretized and coded,and then sent to LR model for final prediction.Through A/B testing with existing online system(A/B testing uses the same environment,compares with the actual performance of several models and selects the best model),several key indicators were compared.The results illustrated that this system was more accurate and extensive in coverage,faster execution,and the effect is greatly improved compared to the online system.It has high practical value in the discrimination of second-hand car financial users.
Keywords/Search Tags:Unbalanced data processing, Spark SQL, User portrait, Machine learning, Efficiency optimization
PDF Full Text Request
Related items