Font Size: a A A

Robust And Profile Inferences For Some Nonparametric And Semiparametric Regression Models

Posted on:2011-01-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:F LiFull Text:PDF
GTID:1100360302499816Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Nonparametric and semiparametric regression models are well developed and pop-ularly used models for their flexibility and/or interpretability in identifying the regres-sion structure between the response variable and predictor variables. Among semi-parametric models, partially linear model is a class of commonly-used model which is flexed enough and well interpretable. It allows easier interpretation of the effect of each variable and may be preferred to a completely nonparametric regression because of the well-known "curse of dimensionality". Recently, in real medical data analysis covariate-adjusted model and variable selection problems are very popular and have received much attention. However, the common kernel methods are sensitive to the bandwidth and can not achieve a satisfactory convergence rate in nonparametric regres-sion setting, estimation for covariate-adjusted partially linear models lacks of studying and limited work has been done on variable selection for partially linear models as noted in Fan and Li (2004). In this thesis we will focus on these problems mentioned above which are related to nonparametric and semiparametric regression models. More specifically, the motivation and the basic ideas of this thesis are as follows.It has been shown that the common kernel estimator for nonparametric regression function can be approximately expressed as From the above representation we find a new regression rule, i.e., r(x) can be regarded as the intercept by regressing (?)hj(x) on hj, so we can rebuild a linear regression model then get the estimator of r(x) by weighted least squares method. The newly proposed estimator has a simple structure and can achieve a smaller mean square error without use of the higher order kernel. We obtain and the optimal bandwidth is the order of O(n-1/9). Further, we find that if the bandwidths hj are not optimally selected but satisfy the following mild condition: hj=O(n-α) with 1/10<α<1/5, the new estimator r(x) still has smaller mean square error than the original one does. This means that the new estimator is robust to the bandwidth. Besides, under some mild conditions we obtain the asymptotic normality of the new estimator as follows, Thus the two-stage (or three-stage) regression estimation proposed in Chapter 2 by combining nonparametric regression with parametric regression can improve nonpara-metric estimation in the sense of both selection of bandwidth and convergence rate. More generally, this new method is also suitable for general nonparametric regression models regardless of the dimension of explanatory variable and the structure assump-tion on regression function, for example, it is extended to the estimation of multivariate nonparametric regression model and additive models.Motivated by covariate-adjusted regression (CAR) proposed by Senturk and Muller (2005) and an application problem which is to investigate the relationship between cal-cium absorption and calcium intake in addressing the problem of calcium deficiency where effects of body mass index and age are considered, in Chapter 3 we introduce and investigate a covariate-adjusted partially linear regression model (CAPLM) defined below, in which both response Y and predictor vector X can only be observed after being distorted by some multiplicative factorsψ(U) andφr(U) respectively, and an additional variable such as age or period T is taken into account. Although our model seems to be a special case of covariate-adjusted varying coefficient model (CAVCM) given by Senturk (2006), the data types of CAPLM and CAVCM are basically different. Observed measurements at a fixed time coming from different subjects which is a key issue enabling the application of CAR in the first step of the proposed estimation procedure in Senturk (2006), however the data we concerned might only consist of one observation at a fixed time. Then the methods for inferring the two models are different. As is shown by Cui et al (2008), we have As a result, we can construct the nonparametric estimators forψ(U) andφr(U). Then the true unobserved values of Y and X can be approximately recovered. Consequently, by replacing the true data with the recovered ones,βcan be estimated by the pro-file least squares method. Furthermore, under some mild conditions, the asymptotic normality of estimator for the parametric component is obtained, details can be seen in Section 3.3. Combined with the consistent estimate of asymptotic covariance we obtain confidence intervals for the regression coefficients.With the development of technology, people can easily obtain and store high di-mensionality data sets with the number of variables p comparable to or much larger than the sample size n. Variable selection plays an import role in the high dimen-sionality data analysis, among which the Dantzig selector performs variable selection and model fitting for linear and generalized linear models. In Chapter 4 we focus on variable selection for partially linear model via the Dantzig selector which is defined as, where X and Y are centered design matrix and centered response observations respec-tively. The large sample asymptotic properties of the Dantzig selector estimator are studied. When n tends to infinity while p is fixed, under some appropriate conditions, we haveβ(?)β0, whereβ0 solves We see that the Dantzig selector might not be consistent. To remedy this drawback, we take the adaptive Dantzig selector following Dicker and Lin (manuscript) defined as Moreover, we obtain that the adaptive Dantzig selector estimator for the parametric component of partially linear models also has the oracle property under some appro-priate conditions, i.e., assume all the regularity conditions hold and when n tends to infinity and p is fixed, we have the adaptive Dantzig selector estimatorβis consistent for model selection and As generalizations of the Dantzig selector, both the adaptive Dantzig selector and the Dantzig selector optimization can be implemented by the efficient algorithm DASSO proposed by James et al. (2009). Choices of tuning parameter and bandwidth are also discussed.In summary, we study the nonparametric and semiparametric regression models further. Firstly, we proposed a robust and bias-corrected estimator in nonparametric regression setting. The new two-stage (or three-stage) estimator has the mean square error with the order of 0(n-8/9) and is robust to the bandwidth selection. Secondly, we investigate the covariate-adjusted partially linear models, further, under some mild conditions, the asymptotic normality of the estimator and the confidence interval for the parametric components are obtained. Finally, we explore the issues on variable selection and parameters estimation for partially linear models. When the sample size n tends to infinity and the number of predictor variables p is fixed, the large sample asymptotic properties of the Dantzig selector estimator for the parameters are studied and the oracle properties of the adaptive Dantzig selector estimator are obtained under some appropriate conditions.Also some simulations and real data analysis are made to illustrate the new meth-ods.
Keywords/Search Tags:nonparametric regression, parametric regression, semiparametric regression, covariate-adjusted regression, partially linear model, high dimensional regression, bandwidth selection, robust, nonparametric estimation, profile least squares
PDF Full Text Request
Related items