Font Size: a A A

Permutation Tests For Two-Sample Means Of High-dimensional Data

Posted on:2022-01-30Degree:MasterType:Thesis
Country:ChinaCandidate:J YangFull Text:PDF
GTID:2480306524481634Subject:Statistics
Abstract/Summary:PDF Full Text Request
Mean vector hypothesis testing of high-dimensional data is a hot and difficult problem recently.Many practical problems will involve the test of two-sample means with high dimensional data,such as whether there is a difference in gene expression,whether there is a significant difference in the efficacy of two treatment regimens,and so on.On the one hand,the classical limit theory is derived under the assumption that the dimension is fixed and the sample size tends to be infinite.So many classical multivariate statistical methods will no longer be suitable for high-dimensional data;On the other hand,most methods of two-sample means with high dimensional data need to clarify the population distribution or assume that the population distribution is normal distribution,which is not suitable.In practice,the population distribution is unknown or uncertain generally,so these methods are no longer applicable.In this paper,we propose a permutation test based on marginal standardized statistics to test the hypothesis of two-sample means with high dimensional data.And this paper focuses on the validity condition and test efficiency of the method.In this paper,we propose a test statistic based on marginal normalization.The test statistic only needs to calculate the diagonal elements of the covariance matrix,and its form is simple.When it is applied to permutation test,the computational burden is greatly reduced.In order to use the test statistic,we need to satisfy the uniform validity condition of(50)_X=(50)_Y or c=1/2,that is,the two populations have the same variance or the same number of samples.Using the principle of Bootstrap methods that the generated data has the same properties as the original observation data,pseudo samples are extracted from the samples with a large number,which can make the two samples have same numbers.We find that permutation test based on marginal standardized statistics is asymptotically consistent under mild conditions by introducing pseudo samples into test statistics to form the empirical distribution function.Finally,the empirical sizes and empirical test potential energy of the new test method and the existing two methods are compared through some simulation experiments.It is verified that the new test method can better control the probability of making the first type of error and has higher test potential energy,which has greater advantages in most(n,p)combinations.In the actual case,we collect and analyze the stock prices of six high-dimensional data industries in China's A-share market from May 2001 to October2017.There are five industries with significant"sell in May"effect,which means the average monthly return of stocks from May to October is not equal to that from November to April of the previous year.
Keywords/Search Tags:permutation tests, high-dimansionality, test of mean-difference, Bootstrap methods, consistency of test
PDF Full Text Request
Related items