Font Size: a A A

Statistical Inference For Heteroscedastic Models

Posted on:2014-10-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:D K XuFull Text:PDF
GTID:1260330392973482Subject:Statistics
Abstract/Summary:PDF Full Text Request
In classical linear regression model, the homoscedasticity of observed datais a basic assumption. Under this assumption, it can be feasible to make routinestatistical inference. If the variance of observations is heterogeneous and unknown,then the regression analysis will meet many troubles. Moreover, we find that thereare many heteroscedastic data around our real life. So, the assumption of thehomoscedasticity is not consistent with the reality. If the variance of observationsis heterogeneous, we call it heteroscedastic data. Generally speaking, statisticalmethods for the heteroscedastic data can be divided into two classes. Class oneis data transformation, such as the variance stabilizing transformation and Box-Cox transformation. Class two is the modeling of the variance. The models arenamed heteroscedastic regression models, in which we simultaneously model themean and variance. We also call it joint mean and variance models. The mainfeature of the models is reflected in the attention of variance. It can explain thereason and law of data changes. This is also an important trend of development indata analysis. Therefore, it is necessary for us to study the mean-variance modelsthoroughly in depth.In addition, as is known to all, variable selection is an important content ofmodern statistical analysis and also is a hot research topic nowadays. To the bestof our knowledge, most existing variable selection procedures are limited to onlyselect the mean explanatory variables. However, little work has been done to se-lect the variance (or dispersion) explanatory variables. We also understand that, simultaneously variable selection for the mean-variance models is of great impor-tance for understanding complex social and economic phenomenons and qualityimprovement experiments of industrial products.In this dissertation, for heteroscedastic regression models, we mainly studythe problem of statistical inference in joint modeling of mean-variance structure.Variable selection, the test for homogeneity of variance and Bayesian inferencefor heteroscedastic regression models are considered, which contain double gen-eralized linear models and semiparametric mean-variance models with complexdata, including high-dimensional data, longitudinal data and skewed data. Morespecifically, the research contents of this dissertation are summarized as follows:For the high-dimensional double generalized linear models, we propose a max-imum penalized pseudo-likelihood method for simultaneous variable selection andmodel estimation. The proposed variable selection procedure can simultaneouslyselect significant variables in both the parametric components of the mean modeland the variance model. Furthermore, with proper choice of tuning parameters,we show that this variable selection procedure is consistent, and the estimators ofregression coefcients have Oracle property. Simulation studies and a real exampleare undertaken to assess the finite sample performance of the proposed method.In addition, we have proposed a variable selection method based on penalizedlikelihood approaches within the framework of joint modeling of mean and covari-ance structures for longitudinal data. To parameterize the covariance matrix, wehave used a new decomposition which has moving average interpretations and pro-vides a natural alternative. Besides, simultaneous variable selection to the meanand covariance structures becomes fundamental to avoid the modeling biases and reduce the model complexities. We have shown that under mild conditions theproposed penalized maximum likelihood estimators of the parameters in the meanand covariance models are asymptotically consistent and normally distributed.Simulation studies show that the proposed methodology performs well.For the semiparametric heteroscedastic regression models, we propose avariable selection method based on regularized restricted maximum likelihood(REML) approaches. The B-spline is used to estimate the nonparametric compo-nent. We have shown that under mild conditions this variable selection procedureis consistent. The proposed procedure can simultaneously select significant vari-ables in both the parametric components of the mean model and the variancemodel. The proposed nonparametric estimator can obtain the optimal rate ofconvergence. Simulation studies and a real example are used to assess the finitesample performance of the proposed methodology.For the double logistic models, we mainly study its specific application inthe analysis of the risk factors of hypertensive disorder complicating pregnancy(HDCP). The relatively popular compression variable selection method is usedto make variable selection and parameter estimation of the risk factors. Then, amore objective forecasting method based on parameter estimation is given. Wecompare forecasting results based on penalized extended quasi-likelihood estima-tors with the prediction accuracy rate based on the classic maximum likelihoodestimators without dispersion structure. We can obtain that prediction accuracyrate is relatively higher based on double logistic models. The proposed variableselection procedure can remove many variables, which can small or do not afectHDCP. The results of predictive analysis are also higher than results without ex- cluding variables. This also shows that our variable selection method has playeda significant role in this study.For the skew-normal semiparametric varying coefcient models, the maximumlikelihood estimation based on B-spline is proposed. We have shown that undermild conditions the proposed estimators of the parameters are asymptotically con-sistent and normally distributed, and the nonparametric estimator can obtain theoptimal rate of convergence. Further, we discuss the score test for homogeneityof the variance in skew-normal semiparametric varying coefcient models. Somesimulated examples show that our proposed methods work well.Finally, we propose a fully Bayesian inference for semiparametric joint meanand variance models on the basis of B-spline approximations of nonparametriccomponents. An efcient MCMC method which combines Gibbs sampler andMetropolis-Hastings algorithm is suggested for the inference. A simulation studyand real data are used to show the efciency of the proposed Bayesian approach.
Keywords/Search Tags:Heteroscedastic regression models, Pseudo likelihood, Ex-tended quasi-likelihood, Restricted likelihood, Variable selection
PDF Full Text Request
Related items