Font Size: a A A

Research And Application Of Variable Selection Method Based On Interval Censorin

Posted on:2022-04-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y L ZhaoFull Text:PDF
GTID:1520306347451674Subject:Financial statistics and risk management
Abstract/Summary:PDF Full Text Request
In many fields such as biomedicine,economics,finance,demography,ecology and environmental protection,there is a common situation in which the researchers cannot observe the specific time of the target event,and the resulting data is called censored data;Censoring specifically includes three categories:left censoring,right censoring,and interval censoring due to the relationship between the observation period and event occurrence time.In many previous studies,in order to facilitate modeling,incomplete samples will be eliminated,but at the same time this also leads to some information wasted.Thus for censored data,survival analysis theory is a very effective method that it incorporates the state of the observation point into the model,so that samples with partial missing information can also play a role.On this basis,this article first outlines the basic concepts of survival analysis theory and the mainstream methods of variable selection,and at the same time explains the past research on variable selection in survival analysis,and then explores an innovative practice based on right-censored data.The research is extended to interval-censored data,and two method innovations are given from the perspective of regularization and the perspective of optimal subsets,and the estimation properties and simulation research are given respectively.In order to verify the effect of the new method on actual data,this paper applies the two methods proposed and other commonly used variable selection methods to the interval censored data instance.Finally,this article summarizes and discusses,summarizes the content and contribution of each piece,and makes some prospects for future research directions.The contributions of this article are as follows:The contributes of this thesis are mainly divided into 4 parts:1.An innovative application of survival analysis with variable selection is given.In recent years,with the popularity of mobile payments,a type of online auto-renewable contract or membership system is in the ascendant.The customer signs an automatic or one-click renewal contract with the merchant on the web or APP platform,and the third-party payment platform will automatically deduct the fee when it expires.This type of contract extends a study of customer retention rates.If the customer’s cancellation of the contract is regarded as "death",and the phenomenon of the customer’s renewal during the entire trial period is regarded as "censorship",then this type of contract is a typical survival analysis problem.In practice,a customer often has a lot of information,and businesses often only need key factors.This paper has done an in-depth study of this type of model,using the regularized proportional hazards model to filter out the truly influential factors,and using the selected factors to establish a customer churn risk scoring mechanism and a score-based classifier.The results show that the prediction results of the classifier are very accurate from multiple dimensions.In this application,this article comprehensively compares several models,and establishes a dynamic threshold mechanism for retention probability based on the optimal model.2.Prom the perspective of regularization,the problem of variable selection under interval censored data is studied,and the adaptive ridge estimation based on the proportional hazard model is extended to the additive hazard model.In this part,the sieve method is introduced to construct a smooth,non-decreasing baseline survival function,and an iterative algorithm is designed to update the parameters to be estimated in each step until convergence.This paper gives the asymptotic properties of the estimation,and proves the sparsity and asymptotic normality of the estimation.At the same time,this article performed numerical simulations in different scenarios,compared the estimated performance under different sample sizes,covariate dimensions,observation frequencies and real baseline survival functions.The results showed that the estimated performance is very good,the estimated value is relatively accurate,and the variables are selected.The true positive rate is very high,while the false positive rate is very low,which can screen out real variables well.3.From the perspective of best subset selection,the problem of variable selection under interval censored data is studied,and an estimation method based on approximate information criteria and free of parameter tuning is proposed.The idea of this method comes from the best subset selection combined with the information criterion.Since the information criterion contains the norm of l0,it is not smooth,so that the optimal value cannot be obtained by differentiation.In this paper,the modified Sigmoid function is used to approximate l0 norm,so as to obtain a smooth approximate information criterion expression.Since the information criterion does not need parameter tuning,the estimation can be obtained by seeking the maximum value of the information criterion.In order to obtain both sparsity and smoothness in the process,this paper designs a reparameterization process to transform the parameters to be estimated into another set of variables.The thesis proves the consistency,the sparsity and the asymptotic normality of the method.In the numerical simulation,the experiment designed scenarios of low observation frequency and high observation frequency,weak signal and strong signal,and compared the estimation results under the premise of different sample sizes and true cumulative hazard functions;at the same time,this paper compares various variable selection methods It is found that the advantage of this method is that the false positive rate is very low and the estimation is accurate.In this experiment,this paper records the average estimated time of various methods,which confirms the high efficiency of this method.4.The method of variable selection on interval censored data is empirically conducted.This article applies for a well-established Nigerian census database created by the United States Agency for International Development.The child mortality data in 2003 is calculated and found higher than 20%,which is much higher than the world average.At the same time,each child There are multiple data,suitable for using variable selection methods to discover the influential factors behind.At the same time,many children’s specific death times can only be accurate to months or years,which constitutes interval-censored data.This article applies the variable selection methods in Chapters 3 and 4 to this example.At the same time,it analyzes the example with some commonly used penalty items and stepwise regression,and filters out the 24 variables that really affect the mortality rate.Finally,the estimation results of various methods are compared,and the fit of the baseline survival function and the baseline cumulative hazard function are given.
Keywords/Search Tags:Interval censoring, Proportional hazards model, Additional hazards model, Variable selection, Child mortality, Client churn
PDF Full Text Request
Related items