Font Size: a A A

Studies On Regression Analysis For Compositional Data

Posted on:2019-10-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:J J ChenFull Text:PDF
GTID:1360330551458778Subject:Basic mathematics
Abstract/Summary:PDF Full Text Request
Compositional data are complex multidimensional data,and reflect the relative infor-mation between the components.They are widely used in the fields of geography,economy,biology and metabolomics.Due to the Aitchison geometry structure of compositional data,the corresponding regression model is different from the classical regression model with real data,and it's relatively complex.This paper mainly studies the imputation for rounded zero in high-dimensional compositional data,and the regression analysis with composi-tional response and covariates,including multiple linear regression,heteroskedastic linear regression and partial least squares regression.Chapter 2 gives the preliminaries for compositional data,including Aitchison ge-ometry structure,coordinate representation,matrix product operation,the center and variability of compositional data,studies the property of matrix product operation and sample center of compositional data.Chapter 3 studies the multiple linear regression model with compositional response and covariates,proposes the multiple linear regression model in the simplex and the model with isometric logratio coordinates in real space.The model in the simplex is based on matrix product operation,it can handle the case that response and covariates have the different number of parts,and the regression coefficients of each part in covariate can be different.Through theorems,this chapter gives the relation between the regression coeffi-cients of the two models,and the estimation and significance test of regression coefficients.Finally,the proposed models are used to analyze the relationship between the consump-tion structure and age structure of Shanxi.Compared with the regression model based on original data in real space,the explanation of regression coefficients in proposed models are more close to the actual situation,and the proposed models have a smaller prediction error.Chapter 4 studies the heteroskedastic linear regression model with compositional re-sponse and covariates.Different from Chapter 3.this chapter considers the case that the error term is compositional data and has heteroskedasticity.The estimation of the re-gression coefficient is obtained by the weighted least square method.For the significance test of regression coefficient,the test statistic is calculated by the oridinary least square estimator and the corresponding heteroskedasticity-consistent covariance matrix estima-tor.Finally,the proposed method is applied to the simulation analysis and real example,and compared with the original least squares method,the results show that the proposed method is superior to the original least squares method in the aspect of the parameter estimation and significance test.Chapter 5 studies the imputation for rounded zeros in high-dimensional compositional data,and proposes a method based on regression imputation with Q-mode clustering.The proposed method first clusters the parts of compositional data through Q-mode cluster analysis,then builds the partial least squares regression between subcompositions in one group and the other groups,finally imputes the rounded zeros in subcomposition using the idea of EM algorithm.Using centered logratio coefficients or isometric logratio coor-dinates in the response of partial least squares regression have the equivalent results for the replacement of rounded zeros.We verify the performance of the proposed method in comparison with existing methods through simulation analysis and real example,the results show that the proposed method can improve the time efficiency and accuracy in high dimension.Chapter 6 studies the partial least squares regression model with compositional re-sponse variables and covariates.Different from Chapter 3,this chapter considers the case that the number of parts of all the covariates is higher than the number of sample obser-vations,and sovels the following two problems:(1)How to build a partial least square regression model in the original sample space of compositional data,that is,the simplex?(2)What is the relationship between the model proposed in the simplex and the partial least square model with centered logratio coefficients in real space?For the first problem,the definition of the sample covariance between compositional variables and the multivari-ate linear regression in the simplex are very critical.For the second problem,this chapter proves that the two models are equivalent,that is,the regression coefficients estimated by the two models are the same.Finally,we analyse the relationship between urine metabo-lite components and blood metabolite components through the actual metabolomics data,the results show that the interpretation of regression coefficient is consistent with the biological significance.The results of this study further improve the regression analysis with compositional data.The proposed models are built directly on the Aitchison geometry of compositional data,they are significant for explaining economic phenomena and studying the interde-pendence between biochemical indicators and metabolites in metabonomics.
Keywords/Search Tags:Compositional data, Isometric logratio coordinates, Regression model, Heteroskedasticity, Rounded zeros
PDF Full Text Request
Related items