Font Size: a A A

Empirical Likelihood Inferences For Semiparametric EV Models And Estimating Equations With Missing Data

Posted on:2012-11-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:X L WangFull Text:PDF
GTID:1100330335985151Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Empirical likelihood method, as a nonparametric method, has received more and more attention since it was first proposed by Owen (1988). It has been popularly used for constructing confidence regions for some interesting parameters and smooth function. Many advantages of empirical likelihood over normal approximation method have been shown in the literatures. For example, it is known that the shape and orientation of empirical likelihood based confidence regions arc determined entirely by the data, and also these regions are range preserving and transformation respecting. Today, as an important nonparamctric method, empirical likelihood has become a very useful tool for statistical inference. Many authors have used the method for linear, nonparametric and semiparametric regression models.However, in many application fields, such as industry and agriculture production, society investigation, economics, biomedical sciences and epidemiology and so on, it is difficult for us to obtain the exact or complete measurement for some variables due to many different kinds of reasons, so complicated data such as measurement error data, missing data, censored data are often encountered. How to deal with these compli-cated data to derive efficient inferences has become one of the hot issues in modern statistical analysis. In this thesis, we shall study some inferences under measurement error data and missing data, that is we shall employ the empirical likelihood tool to investigate two classes of semiparametric models with measurement error data and es-timating equation with missing data, these work further broadens the application areas of empirical likelihood.Instead of observing the interesting variable directly, we only observe its surrogate. A simple and classical measurement error model or crrors-in-variables (EV) model as-sume that W=X+U, where X is the variable of interest,W is the surrogate of X with additive measurement error U which is independent of X and E(U)=0. The simple linear EV model and nonlinear EV model have been well studied. With the development of applied sciences, semiparametric regression models have been well researched and popularly used for their flexibility and interpretability. Among semi-parametric models, varying-coefficient partially linear model (VCPLM) and additive partially linear model (APLM) arc two classes of commonly-used models because they effectively avoid the "curse of dimensionality" of nonparametric model and have the explanatory power of the linear regression model. So in this thesis, we study the in-ferences for VCPLM and APLM only under the classical measurement error model. More specifically, we employ the empirical likelihood method to infer the paramet-ric and nonparametric components for varying-coefficient partially linear EV model in Chapter 2 and empirical likelihood inferences for additive partially linear EV model in Chapter 3.Semiparametric varying-coefficient partially linear EV model has the form as fol-lows where Y is the response, T. X and Z arc regressors,β= (β1,...,βp)' is a p-dimensional vector of unknown parameters,α(T)= (α1(T),....,αq(T))' is a q-dimensional vector of unknown functions andεis the random error with conditional mean zero given X, Z and T. U is the measurement error with mean zero and independent of (X, Z, T). You and Chen (2006) studied this model and proposed a modified profile least squares estimator for the parametric component and local polynomial estimator for the non-parametric component. They showed that the former is consistent and asymptotically normal distributed and the latter achieves the optimal strong convergence rate of the nonparametric regression. But they did not consider the construction of confidence region for the parametric and nonparametric component. If we use the popularly used normal approximation method to derive the confidence region, the result in You and Chen (2006) tells us that the limiting variance of the parameter estimator is very com-plicated, thus it is inconvenient to be used for confidence region construction. So in this thesis we use empirical likelihood to construct the confidence regions for the para-metric and nonparametric components. We first derive an estimator function for the parameter, based on this we define an empirical log-likelihood ratio statistic log(R(β)) for the unknown parameterβ. We show that the statistic -2log(R(β)) is asymptoti-cally standard chi-square distribution under some suitable conditions and can be used to construct the confidence region directly. We also prove the maximum empirical likelihood estimator (MELE)βof the unknown parameter vectorβis asymptotically normal. Then based on theβ, we propose a residual-adjusted auxiliary random vector for the unknown functions a(t) and define the corresponding residual-adjusted empir-ical log-likelihood ratio function l(a(t)) forα(t). Under some suitable conditions the limiting distribution of the -2l(α(t)) is asymptotically a standard chi-square.Similar to the ideas of chapter 2, in chapter 3, we study the empirical likelihood inferences for additive partially linear EV model, which can be written as where Y is the response, X and Z=(Z1,..., ZD)' are covariates on Rp and RD respec-tively,f1,...,fD are unknown smooth functions,β=(β1,...,βp)' is a p-dimensional vector of unknown parameters andεis the random error with conditional mean zero given X and Z. U is the measurement error with mean zero and independent of (X,Z,Y). For simplicity, we study the case of D=2 and assume E{f1(Z1)}= E{f2(Z2)}=0 to ensure identifiability of the nonparametric functions, and X and Y are centered. By correction-for-attenuation, we get a corrected-attenuation auxiliary vector as an estimating function for the unknown parameter and then define the corre-sponding corrected-attenuation empirical likelihood ratio function. Without requiring the undersmoothing of the nonparametric components, we prove that the proposed statistic for the unknown parameter has a standard chi-square limiting distribution asymptotically, and so it can be conveniently used to derive the confidence regions. Sim-ulation studies indicate that, by comparing coverage probabilities and average lengths of the confidence intervals, the proposed method outperforms the profile-based least-squares method which has been studied by Liang, Thurston, Duppert, Apanasovich and Hauser (2008). Based on the proposed empirical likelihood ratio for the parameterβ, we can easily obtain the maximum empirical likelihood estimator (MELE)βofβ, and further the corrected backfitting estimators of the nonparametric functions. So the residual-adjusted empirical log-likelihood ratio statistics for nonparametric func-tions are given and the nonparametric Wilk's theorems are also obtained. It is worth to point out that our inference for f1(z1) does not need to accurately estimate the nonparametric function f2(z2) at any point, we only need to know some values of the corrected backfitting estimator for f2(z2) at the sample observations.In chapter 4, we investigate estimating equation with missing data. In Zhou, Wan and Wang (2008), they imputed the estimating function by nonparametric estimator using the observed data, and then defined a new estimating function for unknown parameter. Since the nonparametric estimator is plugged in, the resulting estimating function becomes biased, the empirical likelihood ratio based on the biased estimating function cannot converge in distribution to a standard chi-square distribution, but a weighted sum of chi-square variables, where the weights are unknown (see Theorem 3 of Zhou, Wan and Wang (2008)). In order to obtain a standard chi-square distribution, adjustment is needed and an unknown adjustment factor needs to be efficiently esti-mated. Besides, under-smoothing involved in nonparametric estimation is needed in selecting the bandwidth, and so they arc inconvenient to use to construct a confidence region for the parameter of interest. Inspired by Xue (2009a) and Xue (2009b), we propose to use the weighted-corrected method to reduce the nonparametric bias and define the augmented inverse probability-weighted estimating function, and under the mild conditions, the resulting empirical log-likelihood ratio for unknown parameter is proved to be a standard chi-square distribution asymptotically, which is different from the result derived in Zhou, Wan and Wang (2008). So our approach avoids estimating an unknown adjustment factor and the commonly used data-driven algorithm can be applied to select an optimal bandwidth. Some simulations further verify our method.
Keywords/Search Tags:Semiparametric regression, empirical likelihood, varying-coefficient partially linear, additive partially linear model, estimating equation, crrors-in-variables model, measurement error data, missing data, confidence region, missing at random
PDF Full Text Request
Related items