Font Size: a A A

Covariance And Population Mean Tests On High-dimensional Data

Posted on:2021-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:Y J WangFull Text:PDF
GTID:2370330626455386Subject:Statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of science and technology and the arrival of the era of big data,the dimension of statistical data will increase with the increase of sample size,sometimes the dimension will be much larger than the sample size,which will cause the dimension expansion,making the research of mathematics,data statistics and analysis facing significant challenges.The classical statistical theory is derived under the assumption that the dimension is fixed and the sample tends to be infinite.the sparsity of high-dimensional data is contrary to the assumption of traditional statistical theory.Therefore,the classical multivariate statistical theory cannot be directly applied to high dimensional data.Specifically,high dimensional data is the phenomenon of"larger p and small n ".In practice,a large number of traditional multivariate analysis theoretical methods and applications need to be modified just to make the dimension higher(larger p).Therefore,under the background of " larger p and small n",it is difficult to apply the classical statistical methods and theories to the problem of high dimensional data,and no longer have strong performance.Therefore,it is necessary to find a good test method for hypothesis testing based on high dimensional data.In this paper,two basic hypothesis testing problems in multivariate statistical analysis are studied under high dimensional data:the first is the covariance matrix equality test,and the second is the population mean equality test.In this paper,new methods are proposed to test the two high dimensional data under the condition that the dimension and the sample size are both infinite.Specifically,in the first question,namely high dimensional double sample covariance matrix equality test problems,this paper put forward a new test statistics-TNew,and at the same time,based on the F-matrix linear spectrum statistic of the central limit theorem is proved that the new test statistics of progressive distribution.The new test method eliminates the limitation of proportional parameters in Xu,and improves both normal and non-normal conditions in high-dimensional data.For the second problem,that is the high-dimensional population mean hypothesis test,in order to make the test results more significant,this paper proposes a new test statistic.The new test statistic not only gets rid of the limitation of data dimension and the size relationship between samples,but also performs well in the MANOVA hypothesis test problem with different distribution followed by different samplesNumerical simulation results show that the two new test statistics TNew and Tn* proposed in this paper are more robust.
Keywords/Search Tags:High Dimensional Data, Population Mean, Covariance Matrix, Asymptotic, Random Matrix, Hypothesis Testing, F-matrix
PDF Full Text Request
Related items