Font Size: a A A

Sparse Estimation For High-dimensional Data With Applications

Posted on:2020-03-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:X LiFull Text:PDF
GTID:1360330620456414Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
High-dimensional statistics deals with the case where the number of the unknown parameters n may be comparable to or substantially larger than the sample size m.Under the high-dimensional scenario,it is acknowledged as a vital challenge to estimate the true underlying parameter in con-temporary statistics,since without other constraints,it is usually impossible to obtain consistent estimators.Fortunately,a lot of empirical studies show that there usually exists some certain low-dimensional structure in the high-dimensional data set,taking which into consideration is beneficial for the estimation.The most popular-used structure is the sparsity structure,and it has been widely used in sparse linear regression,low-rank matrix regression,sparse covariance and inverse covari-ance matrices estimation and so on.Meanwhile,a lot of well-known estimators are formulated as solutions to optimization problems.The high dimensionality also poses some computational challenges on the effectiveness and efficiency.There have been a lot of fruitful results based on convex optimization methods,both in statistical and algorithmic aspects.Recently,thanks to its stronger sparsity inducing capacity,nonconvex optimization methods have been paid more and more attention to in the field of sparse estimation.Due to its intrinsic nonconvexity,local solutions are usually not global solutions of nonconvex optimization problems.However,numerical meth-ods may usually terminate in some specific local solutions,and statistical guarantees for global solutions and any local solutions are still limited.In this dissertation,we investigate the problem of sparse estimation in high-dimensional data sets,analyse the nonconvex optimization method in both statistical and algorithmic aspects,and establish the statistical guarantees for global solutions and local solutions.Finally in the aspect of application,we investigate the heritability estimation problem in the filed of life science,and propose a novel method to estimate heritability by virtue of the underlying sparsity of the genome-wide association studies?GWASs?.The main work in this dissertation is organized as follows.In Chapter 2,we consider one of the most popular nonconvex regularizer–the q?0<q<1?norm.We discuss the statistical properties of the qoptimization methods?0<q?1?,including the qminimization method and the regularization method,with the high-dimensional linear re-gression as an example.For this purpose,we introduce a weaker q-restricted eigenvalue condition and provide its sufficient conditions in terms of several widely-used regularity conditions such as sparse eigenvalue condition,restricted isometry property,and mutual incoherence property.Then for either deterministic or random designs,we show by virtue of the q-REC that the 2recov-ery bounds and oracle properties for the global solutions of the qminimization method and the regularization method hold with high probability,respectively.These results demonstrate the sta-tistical consistency of the lower-order optimization method under a weaker condition,and provide a unified framework for analysing the statistical properties of qoptimization methods.Finally,the numerical results verify the established statistical property,and demonstrate the advantages of the qoptimization methods over some existing sparse optimization methods.In Chapter 3,we analyse the statistical and algorithmic properties of local solutions of general nonconvex regularized M-estimation problems,with the regularizers containing the commonly-used SCAD and MCP as special cases.In the statistical aspect,we establish the 2recovery bound for any stationary point of the nonconvex regularized M-estimation problem,under re-stricted strong convexity and some regularity conditions on the loss function and the regularizer,respectively.This result is of algorithmic independence,and implies that any stationary point of the nonconvex optimization problem lies in a small neighbourhood of the true parameter,thus guar-anteeing the statistical consistency for all the stationary point.In the algorithmic aspect,in order to solve the nonconvex optimization problem,we slightly decompose the objective function,and then apply the proximal gradient method.The algorithm is proved to achieve a linear convergence rate,which is the fastest convergence rate that a first-order optimization method can attain.In particular,we note that for SCAD and MCP,a simpler decomposition is applicable thanks to our general assumption on the regularizer,which helps to construct the iteration with better estimation performance.Finally,we demonstrate our theoretical consequences by several numerical experi-ments on the corrupted errors-in-variables linear regression models.The numerical results show a high consistency with the theoretical results.Heritability is a critical measure in the exploration of genetic architecture of human complex traits.It measures how much the variation of a phenotypic trait in a population is caused by the genetic variation among individuals in that population.Data from ultrahigh dimensional GWASs have been used to estimate the heritability in recent years.Existing methods are based on the linear mixed model,with the assumption that the genetic effects are random variables,which is opposite to the fixed effect assumption embedded in the framework of quantitative genetics theory.More-over,the estimation of heritability provided by existing methods may have a large standard error,and thus the reliability of the estimator is doubtful.In Chapter 4,we first investigate the influ-ences of the fixed and random effect assumption on heritability estimation,and prove that these two assumptions are equivalent under mild conditions in the theoretical aspect.Then we propose a two-stage strategy by first performing sparse regularization via crossed-validated elastic net,and then applying variance estimation methods on the reduced model to construct reliable heritability estimations.Results on both simulated data and real data show that our strategy achieves a consid-erable reduction in the standard error while reserving the accuracy.The proposed strategy shows the promising future that reliable estimations can still be obtained with even a relatively restricted sample size,and should be especially useful for large-scale heritability analyses in the genomics era.
Keywords/Search Tags:high dimension, sparsity, parameter estimation, nonconvex optimization method, recovery bound, consistency, convergence rate, heritability
PDF Full Text Request
Related items