Font Size: a A A

On The Population Covariance Matrix In Large Dimensional Data

Posted on:2014-01-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:C WangFull Text:PDF
GTID:1220330398464257Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
In the past two decades, the most hot topic in statistics is the high-dimensional data which are often referred to as high-dimension, low-sample-size data, or "large p small n" data where p is the number of data dimensions and n is the sample size. High-dimensional data poses great challenges to the traditional statistical and computational methods. In other word, we can not ignore the influence of the data dimension p anymore like we did in classical statistical analysis.The covariance matrix is of fundamental importance in multivariate analy-sis with a wide range of applications in a number of areas of statistical analysis, including dimension reduction by principal component analysis (PCA), classifica-tion by linear or quadratic discriminant analysis (LDA and QDA), establishing independence and conditional independence relations in the context of graphi-cal models, and setting confidence intervals on linear functions of the means of the components, Hottelling T2statistic, Markowitz mean-variance analysis and so on. In this work, we will address three aspects of the large dimensional sample covariance matrix.1. We study the limiting spectral distribution of population covariance ma-trices and sample covariance matrices for general stationary time series. For the stationary linear process, we derive the LSD of the population covariance matrix directly and express the LSD of the sample covariance matrix through the power spectral density function which built a connection between limiting spectral distri-bution and power spectral density function. As applications, the classical M-P law, AR(1), MA(1), ARMA(1,1) and m-dependent models will also be stud-ied and the results can be extended to models that have the similar population covariance matrix.2. We redefine the LRT and LW tests using classic sample covariance ma- trices. The distributions of the two new tests are derived in general conditions which can accommodate data with unknown means and non-Gaussian distribu-tions. Moreover, the asymptotic distribution of LRT will be studied under locally alternatives and an explicit expression of the power will also be derived under some conditions.3. We consider the asymptotic properties of the matrix and its relation with (Sn+λIp)-1. Here, Σp is the population covariance matrix and Sn is the sample covariance matrix. Based on these limiting results, we propose an optimal linear combination of Sn and Ip under the loss function The new estimation is non-parametric without assuming a specific parameter dis-tribution for the data and also there is no prior information about the structure of the population covariance matrix. The new estimator has no restriction on p<n and is applicable for p≥n.
Keywords/Search Tags:Asymptotic power, Empirical spectral distribution, High dimension-al data, Hyphothesis test, Large dimensional data, Limiting spectral distribu-tion, Loss function, Population covariance matrice, Sample Covariance Matrices, Shrinkage estimation
PDF Full Text Request
Related items