Font Size: a A A

Study Of Statistical Inference Methods For Correlation Of Two Types Of High-dimensional Data

Posted on:2023-12-30Degree:MasterType:Thesis
Country:ChinaCandidate:Z J ZhouFull Text:PDF
GTID:2530306935495594Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
With the rapid development and widespread application of science and technology,the emergence of high-dimensional data in various fields can be described as numerous,and the investigation of how to efficiently mine the information of high-dimensional data has become an important topic in recent years.Among them,the study and analysis of correlation is an effective means to extract information of high-dimensional variables,which has attracted much attention from statisticians.Accordingly,the paper aims to solve such problems of complex correlations among high-dimensional variables that easily lead to errors in screening variables or analyzing their relationships,and successively proposes the idea of PLS variable selection based on the Mahalanobis distance correlation and the generalized projection test for partial correlation coefficients in high-dimensional normal data.The main work of the first chapter introduces the research background and significance of statistical inference methods in the field of high-dimensional data,as well as the research status at home and abroad.The work in the second chapter introduces the derivation calculation of correlation coefficient,partial correlation coefficient and distance correlation coefficient and projection knowledge.In Chapter 3,based on the research of SIS method,DC-SIS method and distance correlation t test,the variable selection method based on the Mahalanobis distance correlation is proposed,and then combined with the PLS model,it is easy to have the problem that the variance of extracted components may reach the maximum due to complex correlation,but the correlation between component variance and prediction variables reaches the maximum.The proposed method is based on the idea of introducing the correlation type into the PLS model,and the numerical simulations and case studies reveal that the variable selection ability of the PLS model is improved after the introduction of the correlation variable selection method;the proposed method has different degrees of prediction ability and stability performance compared with other methods for the identification of complex correlations among high-dimensional variables.Chapter 4,in order to further deal with the complex correlations between high-dimensional variables,mainly addresses the problem that the partial correlation coefficients under high-dimensional control variables cannot be calculated using classical methods,introduces the distribution of partial correlation coefficients and its hypothesis testing in the low-dimensional case,constructs an orthogonal projection matrix in combination with the analysis of the relationship between the regression coefficients of control variables and partial correlation coefficients in Chapter 2,and proposes a generalized projection of partial correlation coefficients in high-dimensional normal data theoretically,the feasibility of this algorithm is demonstrated.Finally,the proposed method is tested on simulated data with different sparsity assumptions,different data structures,and different degrees of partial correlation,and compared with the regularization combined with quadratic regression for partial correlation coefficients,the simulation results show that the proposed algorithm shows excellent predict performance in estimating and testing partial correlation coefficients with saving operation cost.
Keywords/Search Tags:High-dimensional data, Mahalanobis distance correlation, PLS, Test of partial correlation coefficient, Generalized projection
PDF Full Text Request
Related items