Font Size: a A A

Robust Estimations And Feature Screenings For Some Nonparametric And Semiparametric Models

Posted on:2014-01-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:J SunFull Text:PDF
GTID:1220330398959604Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
The interest in nonparametric and semiparametric modeling has grown quickly within the last decades, and there is a huge amount of literature that investigate various estimation methods for nonparametric and semiparametric regression. Nonparametric models maximize flexibility and minimize the risk of the error of model specification. However, the price of this flexibility can be high for several reasons. First, estimation precision decreases rapidly as the dimension of the predictors increases, i.e., the curse of dimensionality, which is an unavoidable problem in nonparametric estimation. Sec-ond, it is difficult to integrate discrete predictors into the nonparametric specification. And third, it is a sophisticated task to graph and interpret the resulting function in the multidimensional case. Semiparametric models offer a compromise between the flex-ibility of nonparametric models and the interpretability of parametric models, which make assumptions about functional forms that are stronger than those of nonpara-metric models but less restrictive than those of parametric models, thereby reducing (though not eliminating) the possibility of specification error. Most existing estimation procedures are built on least squares; it’s known that least squares is non robust and needs the assumption of finite second order moment of the random error. On the other hand, with the rapid development of the ability to collect data, ultrahigh and high di-mensional data frequently appear in the social life and scientific research; thus variable selection and feature screening techniques become another popularity of statistics of research. In this dissertation, we study robust estimation and feature screening meth-ods for nonparametric and semiparametric models respectively to further perfect the related method and theory. Chapter2studies the following general nonparametric regression model Y=m(T)+σ(T)ε, where Y is the response variable, T is a scalar covariate independent of the random error ε, E(ε)=0, and var(ε)=1. Assume that rn(·) is smooth and σ(·) is positive. Kai, Li and Zou (2010) proposed local composite quantile regression (LCQR) method for the above nonparametric model and proved that LCQR could significantly improve the estimation efficiency of local least squares (LLS) for the case of non-normal and symmetric error distributions while losing just a small amount of efficiency for normal error. However, the assumption of symmetric random errors is an indispensable pre-requisite for estimation consistency of LCQR, without which the LCQR estimate is no longer consistent. In practice, the error density is generally unknown; the assumption of symmetric errors is strong. Therefore we put forward a unified method called weighted local composite quantile regression (WLCQR) to construct unbiased estimation of m(·) for general random errors, including both asymmetric and symmetric errors. For any given point t0. we construct the WLCQR estimator of m(t0). We combine the initial estimators{ak, k=1,...,q} computed by Kai, Li and Zou (2010) with possibly dif-ferent weights{ωk, k=1,...,q) and uniform quantiles{τk=k/(q+1), k=1,...,q}. Denote by F-1(·) the quantile function of the error ε. Then the WLCQR estimator m.(t0) of m(t0) can be denned as where the weight vector ω=(ω1,ω2,...,ωq)T satisfies conditionsThe condition∑kq=1ωF-1(τk)=0provides an opportunity to eliminate the bias term in the asymptotic representation caused by asymmetric random errors, which guarantees the estimation consistency and unbiasedness of m(t0) asymptotically. Then we can establish the asymptotic bias, asymptotic variance and asymptotic normality of m{t0). That is,andFurthermore, owing to the non uniqueness of the weight vector ω, we calculate the theoretically optimal weight vector ω*by minimizing asymptotic variance, and con-sequently we obtain the optimal estimate m*(to) of m(t0), whose minimum variance is For the case of symmetric errors, we compare the optimal estimate in*(to) with both the LLS estimator denoted by mts(t0) and the LCQR estimator denoted by mcqr(t0) via asymptotic relative efficiency, that is, we obtain Finite sample behaviors conducted by monte carlo simulations and a real data, analysis further illustrate our theoretical findings.Chapter3examines the following varying-coefficient partially linear model Y=XTα(U)+ZTβ+ε,where α(U)={α1(U),..., αd1((U)}T is a d1×1dimensional vector of unknown smooth regression coefficient functions, β=(β1,...,βd2)T is a d2×1dimensional vector of un-known true parameters. Assume U is univariate and the random error ε is independent of the covariates{U,X,Z}. For any given point u0, we develop a robust estimation procedure for the above varying coefficient partially linear model via local rank. The model involves both nonparametric and parametric components, and they should be estimated with nonparametric and parametric rates of convergence, respectively. Mo-tivated by Kai et al.(2011), we propose a three-stage estimation procedure to achieve a local rank estimation. In the first stage, we employ a local linear rank technique to derive the initial estimates of β and α(u0). Then, in the second stage, we use global rank regression to improve the convergence rate of the initial estimate of β. Finally, we use the local linear rank method again to improve the estimation efficiency of the initial estimate of α(u0). Finally, we can establish the asymptotic normality for the local rank estimate βLR of the parametric part β and the local rank estimate αLR(u0) of the nonparametric part α(u0) separately. That is, and Next we calculate the asymptotic relative efficiency (ARE) of the local rank method with respect to the local least squares method. By comparing the asymptotic relative efficiency, we find that our local rank method provides a highly efficient and robust alternative to the local least squares. In other words, the local rank method is highly efficient across a wide class of non-normal error distributions and it only loses a small amount of efficiency for normal error. Moreover, it is proved that the loss in efficiency is at most11.1%for estimating varying coefficient functions and is no greater than13.6%for estimating parametric components. Moreover, monte carlo simulations and a real data example are conducted to examine the finite sample performance, and the numerical results are consistent with our theoretical conclusions.In Chapter4, we focus on feature ranking and screening methods for ultrahigh dimensional models. Most existing methods assume a specific model structure, and also heavily depend on the belief that the working model is close to the underlying true model. Zhu, Li, Li and Zhu (2011) introduce a novel feature screening procedure called sure independent ranking and screening (SIRS) under a general model framework. SIRS does not require imposing specific model structures on regression functions and thus covers a wide variety of commonly seen parametric and semiparametric models. However, SIRS may miss some active predictors under certain cases, Chapter4will give detailed discussions. To further improve SIRS, we first use the "local" information flow of the predictors to define a new measure criterion, and then propose a nonparametric ranking and screening (NRS) method. NRS method needs no assumption of the model structure and its corresponding criterion is defined as follows: whereHere w(xk) is a weight function satisfying w(xk)≥0, E[w(Xk)]=1, and a simple choice is w(xk)=2E[I(Xk<Xk)]-We use ψk to measure the marginal utility of the predictor Xk and then establish the property of consistency in ranking for NRS. That is, under some regular conditions, there exists a sufficiently small constant sδ/2∈(0,4/δ) such thatMoreover, we take the correlations among active predictors into account and append them into the ranking and screening procedure to make the nonparametric feature screening more comprehensive. By examining various regression models in the simu-lation part, we find that our new method performs uniformly and significantly better than those existing feature screening methods.
Keywords/Search Tags:Local composite quantile regression, asymmetric error, consis-tency, asymptotic relative efficiency, local rank, varying-coefficient partially linearmodel, ultrahigh dimensional data, function-correlation, marginal utility
PDF Full Text Request
Related items