Font Size: a A A

The First Hitting Time Model With Deep Neural Network And Factor Machine(DeepFM-FHT?Model)

Posted on:2021-10-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y C LinFull Text:PDF
GTID:2518306131981969Subject:Statistics
Abstract/Summary:PDF Full Text Request
In the era of big data,people face a lot of information,such as comments,long videos and short videos,news information and e-commerce products.Each individual is not only a consumer of data,but also a producer of data.On the web,as users,we generate a large amount of data every day.For example shopping,chatting,active in various communities,writing articles,browsing videos and so on.The volume of Internet users in China is getting larger and larger,the Internet penetration rate is getting higher and higher,and the construction of various infrastructures is also constantly developing.With the development of the Industrial Internet,big data have penetrated into multiple industries and fields and have become an increasingly important production factor.So it becomes very important to use and mine data.At present,in the recommended marketing scenario,people still use models such as Logistic Regression and Xgboost,and this model will face two disadvantages in processing data.One is the processing of time,and the other is the use of censored data.Logistic regression and tree models are not appropriate for learning time,which easily cause overfitting.On the other hand,when faced with censored data,people tend to think that this data is incomplete and choose to discard or simply treat it as "0".And in some special scenarios,when the sample size is small,it will bring great challenges to the final model prediction effect and generalization ability.The survival analysis model can just avoid the above two disadvantages.Survival analysis was originally applied in the medical field to accommodate to censored data.Compared with the LR model,the survival analysis model considers study time and censoring indicator as dependent variables.In the field of survival analysis,the most widely used is the Cox proportional hazard model,but this model has strict assumptions.First,it must be satisfied that hazard ratio does not change with time,that is,with time changing,the corresponding hazard still maintains a constant proportional relationship;Second,the log-hazard ratio needs to maintain a linear relationship with covariates(various survival factors).This paper introduces the knowledge of deep learning when learning the relationship between the first hitting time and covariates,we do not make any assumption about potential random processes,and the Deep FM framework was applied to learn appropriate covariate patterns through deep neural network.To accelerate the convergence of the model and improve the prediction accuracy of the model,the loss function incorporates three components: the likelihood function of the relationship between FHT and event time,the discrimination index and the goodness-of-fit index.In empirical analysis,this paper first applies the proposed method to a biomedical data set and a simulated data set,using C-index as the evaluation index,comparisons with several other common survival analysis models were also carried out.The results showed that the proposed method works better.In addition,this paper also applies the new method to an e-commerce transaction data set.Through feature engineering and model comparison analysis,it was found that the proposed method is better than logistic regression in effect(auc,accuracy etc),but slightly worse than Xgboost.However,when incorporating time variable as feature,Xgboost commits overfitting.The results showed that the method proposed in this paper can be used in the field of recommendation marketing,especially in scenarios that focus on time,such as predicting the time user churn,predicting the time user defaulting and so on.At the end of this article,after summarizations,the future research directions are conceived,and the applicable scenarios and directions are prospected.
Keywords/Search Tags:Deep Learning, Factorization Machine, First Hitting Time, Recommend Marketing, Survival Analysis
PDF Full Text Request
Related items