Font Size: a A A

Statistical Analysis For Two Types Of Complex Data And The Associated Models

Posted on:2009-04-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:X CuiFull Text:PDF
GTID:1100360245994125Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Large and complex data sets generated in financial markets,medical diagnostics. environmental surveys and other sources have been of interest in the past three or four decades,since the rapid development of computing speed and storage capability has enabled one to collect,store and analyze them.These data sets maybe include outlying values,be measured with error,be collected repeatedly over time from individuals,have very high dimension(large p,small n)and so on.Novel statistical analysis methods capable of dealing with such data are required more than ever before in almost all branches of science.This thesis mainly concerns about two types of data:one that are confounded by one common covariate,the other with outlying values.And we carry on discussion about regression analysis for the former and,discriminant analysis and robustifying the quasi-likelihood framework for the later.An example where one data of the type confounded by one common covariate is relevant is the fibrinogen data of Kaysen et al.(2003),where the regression of fibrinogen level on serum transferrin level in haemodialysis patients is of interest.Both observed response and predictor are known to depend on body mass index,defined as weight/height~2,which thus has a confounding effect on the regression relationship.To explore such confounding in regression and to develop appropriate adjustment methods, Sentürk and Müller(2005)constructed the "covariate-adjusted linear regression" (CALR)model,and proposed an estimation method for the regression coefficient parameters in terms of transforming CALR to a varying-coefficient regression model.In Chapter 2,we recommend an alternative estimation procedure that directly estimates the parameters in the following naive manner:In terms of estimating nonparametri- cally the distorting functions by regressing the predictors and response on the common covariate,and then the estimators of the parameter are constructed via regressing the estimated response on the estimated predictors.Root n-consistency and asymptotic normality of the parameter estimators are obtained.For comparison,a necessary and sufficient condition that ensures a smaller limiting variance of the naive estimators than the one of Sentürk and Müller's method is provided.For the same type data of Chapter 2.Chapter 3 suggests and investigates a "covariate-adjusted nonlinear regression"(CANLR)model.In this model,both response and predictor vector can only be observed after being distorted by some multiplicative factors.Because of the nonlinearity,the estimation method proposed by Sentürk and Müller(2005)used for linear case can not be directly employed.To attack this problem,following the method of estimating the distorting functions in Chapter 2,and then the nonlinear least squares estimators of the parameters are obtained by using the estimated response and predictors.Again,root n-consistency and asymptotic normality are achieved.However,the limiting variance is of very complicated structure with several unknowns and then the confidence regions based on normal approximation are not efficient.To avoid estimating the limiting variance,the empirical likelihood-based confidence regions are constructed and their accuracy is also verified.It is somewhat surprising that,unlike the common results derived from the profile methods,by our method,even when a plug-in estimation is applied to replace infinite-dimensional nuisance parameters(distorting functions),the limit of empirical likelihood ratio is still chi-squared distributed.This property makes it possible to construct the empirical likelihood-based confidence regions.Outliers often indicate the most interesting data points,like polluted areas for environmental data,or irregularities in online monitoring of patients.The classical discriminant rules can be strongly influenced by the presence of outliers in the training sample,through which the results can become unreliable.This creates a need for robust alternatives that behaves more stable in the presence of outliers in the data.Existing literature provides results for robust discriminant analysis,although these results were mainly restricted to the linear or quadratic discriminant analysis.We study robust nonparametric discriminant analysis for this class of data in terms of our newly defined extended projection depth(abbreviated to EPD)in Chapter 4,where the classification rule is to assign an observation to the population with respect to which it has the maximum EPD.Asymptotic properties of misclassification rates and robust properties of EPD-based classifier are discussed.It is found that when the underlying distributions are elliptically symmetric,EPD-based classifier is asymptotically equivalent to the optimal Bayes classifier.The final Chapter gives a general procedure of constructing robust quasi-likelihood estimating functions,rather than concentrate on particular problems,for discrete stochastic processes by downweighting outlying orthogonal differences via original projection depth.In this study we consider observations that correspond to such processes with additive outliers.As usual this produces an estimating function,which has certain optimality properties,within a specified class of estimating functions.The obtained estimating functions and parameter estimation have desirable robustness,which attain very high breakdown values close to 1/2(p+1).At the same time,the obtained parameter estimation still has ordinary asymptotic behaviors such as asymptotic normality. We also discuss the change of efficiency involved in robustness.Simulations and real data applications are used to illustrate various methods.
Keywords/Search Tags:Complex data, covariate-adjusted regression, robust discriminant analysis, robust quasi-likelihood, ordinary least squares, projection data depth, kernel estimation, empirical likelihood, confidence regions, asymptotic behaviors
PDF Full Text Request
Related items