Font Size: a A A

Methods And Theories For Semiparametric Regression Models With Complex Data

Posted on:2020-12-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z L WangFull Text:PDF
GTID:1360330623956565Subject:Statistics
Abstract/Summary:PDF Full Text Request
Regression analysis is a powerful tool to study the relationship between variables.One can explain some phenomena,predict the future development trend and provide references for policy-makers by using regression.The methodology of regression is applied in almost all scientific disciplines,including biomedicine,economic management,industry and agriculture,etc.To fit the data better,regression models have been improved and developed from the initial parametric regression models to the semiparametric regression models.Semiparametric models contain both parametric components and nonparametric components.Therefore,they not only retain the interpretability of parametric models,but also retain the flexibility of nonparametric models and avoid their curse of dimensionality.Semiparametric models not only have practical application background,but also have extensive foreground and great value in application.Hence,semiparametric models have received a wide range of attention,and are becoming a hot issue in the modern statistical studies.Complex data occur frequently in modern experiments and investigations,including high dimensional data,measurement error data,censored data,missing data and longitudinal data,etc.In statistical analysis,ignoring the inherent structure of these data will reduce the efficiency of statistical inference and even lead to wrong conclusions.Therefore,statistical analysis and modeling of complex data is particularly important.There are still many open statistical problems in the semiparametric regression models with complex data.So it is of great theoretical and practical significance to study the statistical methods and theories of the semiparametric regression models with complex data.This dissertation mainly studies the estimations and tests of the semiparametric regression models with complex data such as high dimensional data,measurement error data and missing data.Specifically,the research contents of this dissertation have six parts as follows.(1)For sparse ultra-high dimensional partially linear varying coefficient models,we simultaneously study variable selection and estimation problems.We mainly focus on that the number of variables in linear part can grow much faster than the sample size while many coefficients are zeros and the dimension of nonparametric part is fixed.We apply the B-spline basis to approximate each coefficient function.First,we demonstrate the convergence rates as well as asymptotic normality of the linear coefficients for the oracle estimator when the nonzero components are known in advance.Then,we propose a nonconvex penalized estimator and derive its oracle property under mild conditions.Furthermore,we address issues of numerical implementation and of data adaptive choice of the tuning parameters.Some Monte Carlo simulations and an application to a breast cancer dataset are provided to corroborate our theoretical findings in finite samples.(2)Unlike statistical inference for regression coefficients in the literature,we consider the problem of variance estimation in partial linear variable coefficient semiparametric models.By using the local constant function coefficient,the semiparametric model can be converted into a high dimensional linear model.And then the variance estimation based on the least square method is constructed,the asymptotic normality for the resulting estimator is also established.In order to reduce the mean square error of the least squares estimator,we also propose a kind of regularized least squares method named ridge estimator.Finally,the numerical simulations are conducted to illustrate the finite sample performance of the proposed two estimation methods.(3)For sparse ultra-high dimensional varying coefficient models,we consider the problem of variance estimation.We first use B-spline to approximate the coefficient functions,and discuss the asymptotic behavior of a naive two-stage estimator of error variance.We also reveal that this naive estimator may significantly underestimate the error variance due to the spurious correlations,which are even higher for nonparametric models than linear models.This prompts us to propose an accurate estimator of the error variance by effectively integrating the sure independence screening and the refitted cross-validation techniques.The consistency and the asymptotic normality of the resulting estimator are established under some regularity conditions.The simulation studies are carried out to assess the finite sample performance of the proposed methods(4)For high dimensional linear model with error-in-variables,a novel debiased procedure is developed and analyzed to construct component-wise confidence intervals of the regression coefficient.The proposed method is not only able to account for measurement errors to avoid non-vanishing biases,but also to compensate the biases introduced by penalization.The resulting estimator is asymptotically unbiased and normal under mild conditions.Then it can be used to construct valid confidence intervals and conduct hypothesis tests.Results of an extensive simulation study are also presented to show the efficacy and usefulness of our procedure(5)For the high dimensional partially linear varying coefficient models,we consider variable selection procedure when the parametric part covariates are measured with additive errors.The penalized bias-corrected profile least squares estimators are conducted,and their asymptotic properties are also studied under some regularity conditions.The rate of convergence and the asymptotic normality of the resulting estimates are established.We further demonstrate that,with proper choices of the penalty functions and the regularization parameter,the resulting estimates perform asymptotically as well as an oracle property.Choice of smoothing parameters is also discussed.Finite sample performance of the proposed variable selection procedures is assessed by Monte Carlo simulation studies(6)For partially nonlinear models with response missing at random,we consider the specification test of the nonparametric component.Two quadratic conditional moment test statistics with their asymptotic properties are developed.Our test methods posses the virtue that p-values can be easily determined and Type I error can be asymptotically exact controlled.Further,the tests can also detect the alternatives distinct from the null hypothesis at the optimal nonparametric rate for local smoothing-based methods.Thorough Monte Carlo simulations and an application to one real data set are provided to demonstrate excellent finite-sample performance of the proposed methods.
Keywords/Search Tags:High dimensional data, Measurement error dada, Missing data, Semiparametric regression models, Variable selection, Confidence region
PDF Full Text Request
Related items