Font Size: a A A

Variable Selection Of Zero-inflated Model With Missing Covariates

Posted on:2022-09-06Degree:MasterType:Thesis
Country:ChinaCandidate:J R WenFull Text:PDF
GTID:2480306542486154Subject:Statistics
Abstract/Summary:PDF Full Text Request
In the research of agriculture,econometrics,manufacturing,medicine,road safety and many other fields,count data are often encountered.The commonly used distributions to fit such data are Poisson distribution,binomial distribution and negative binomial distribution.However,in practice,the observed data sometimes has a large proportion of zeros,and the traditional count distribution is no longer applicable.Therefore,scholars jointly model the count distribution model and the degraded zero model,and propose a zero-expansion regression model to fit this data.Aiming at the problem of the upper limit of counting data,this paper proposes a zero-inflated binomial regression model.Assuming that the mixed probability4)and the event probability4)follow the logit regression model with parametersandrespectively,a score equation is established to solve the parameter estimates.Aiming at the problem of the missing covariates of the zero-inflated binomial regression model,this paper adopts the inverse probability weighted estimation method to correct the deviation caused by deleting incomplete situations,and uses the logistic regression model to define the weights,and establish the consistency and gradualness of the estimators.With large sample properties such as near normality,a consistent estimate of the asymptotic variance-covariance matrix is obtained.Aiming at the problem of multiple covariates and correlation,this paper adds SCAD,MCP and LASSO penalty functions to the weighted likelihood function to obtain the penalty objective function based on zero-inflated binomial regression,and then uses the EM algorithm to study the parameter estimation of the model And variable selection issues.In order to prove the validity of the established model,Monte Carlo simulations of the model under different sample sizes,different covariate missing ratios,and different zero ratios are given.The simulation results show that when the covariates are missing,the inverse probability weighting method has good performance for estimating the zero-inflated binomial regression model.In terms of mean square error,the parameter estimation effect of the zero expansion part is better than the parameter estimation of the binomial distribution,and the SCAD penalty method is better than other methods;in terms of sensitivity,when the sample size is 100,the LASSO and SCAD of the zero expansion part The sensitivity of punishment is not too high,and MCP shows better results.When the sample size is increased to 300,the sensitivity of all penalty functions is improved.In the binomial distribution part,the sensitivity and specificity of all methods increase with the increase of sample size,and the effect is more superior.At the same time,it is concluded that the estimation result of the penalty model when the covariate is not missing and the estimation result obtained by applying the inverse rate-weighted penalty model are asymptotically unbiased.
Keywords/Search Tags:Zero expansion binomial model, Inverse probability weighting, Variable selection, Missing data
PDF Full Text Request
Related items