Font Size: a A A

Hypothesis Test And Application Of High-Dimensional Directed Acyclic Graphical Models For Exponential Distribution Family

Posted on:2024-04-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:S Y ChenFull Text:PDF
GTID:1520307307490534Subject:Mathematical Statistics
Abstract/Summary:PDF Full Text Request
The analysis of causal effects among random variables has become an popular topic in statistics and machine learning,and has wide applications in neuroinformatics and bioinformatics.Such an analysis for random variables is usually performed through a Bayesian network model or a vector autoregressive model.The Bayesian network is a special graphical model whose structure represents the conditional independence of a set of random variables.In this article we use poisson regression to model the conditional distribution of the discrete random variables we are interested.The edges between a pair variables represent the causal relationship between them.Besides,we can prove the structure estimation consistency of our poisson regression model.To obtain a sparse structure we employ the truncated L1 penalty,and introduce constraints to guarantee the acyclicity.We employ a constrained likelihood ratio test to perform statistical inference on directed linkages and directed pathways,and derive the asymptotic distributions of the test statistic under high-dimensional conditions under the null hypothesis is true and the alternative hypothesis is true,respectively,and we show the Size and Power of hypothesis test using simulated data.We apply the constrained likelihood ratio test to basketball statistics of NBA players during 2016-2017 season.Due to the availability of high-dimensional data in bioinformatics and related fields and the surge in the corresponding analysis requirements,generalized linear models(GLMs)are widely used in the analysis of high-dimensional data with high bioinformatics.The problem of testing the significance of regression coefficients has a wide range of applications in a series of biological problems such as finding important gene sets.The statistical inference of generalized linear models in high-dimensional situations has always been the focus of scholars’ research.The second focus of this paper is to generalize the statistical inference method of Bayesian networks to generalized linear models.We no longer give the specific form of the distribution of random variables,but only assume that the conditional distribution of random variables is an exponential family distribution.In order to ensure the acyclicity of the graph structure,we still constrain the structure of the network to avoid generating cycle paths.At the same time,we use the constrained likelihood ratio test to perform statistical inference on directed edges and directed pathways,and use simulated data to verify the Size and Power of hypothesis tests.We apply our constrained likelihood ratio test to the Down Syndrome Mouse dataset to figure out whether or not the overexpression of genes,will perturb the processes and pathways related to brain development and function,and to identify the perturbations in pathways that are critical to learning and memory.Vector autoregression(VAR)is a statistical model used to capture the relationship between multiple variables as they change over time.Like the autoregressive model,each variable has an equation modelling its evolution over time.This equation includes the variable’s lagged(past)values,the lagged values of the other variables in the model,and an error term.The third research focus of this paper is to use Gaussian Network Autoregression(GNAR)to model multivariate time series.Unlike traditional vector autoregressive models,it captures the instantaneous structural relationships between variables within the model system.When applying the structural autoregressive model to practice,the joint distribution of the error terms is almost always assumed to have a multivariate Gaussian distribution.This means that the joint distribution of errors is completely determined by their covariance.So without some additional information or constraints,the structural errors cannot be identified because any orthogonal transformation of them can be the optimal solution of the model.In this paper,we combine the network and the autoregressive model.We assume that the error term obeys the Gaussian distribution of equal variance,and add acyclic constraints,and prove the identifiability of the Gaussian autoregressive network and the consistency of the model.We apply our GNAR model to find the causal relations among several world stock indices.
Keywords/Search Tags:Constrained likelihood ratio test, Difference of convex functions programing, Causal Inference, Poisson Bayesian network, Gaussian network Autoregression
PDF Full Text Request
Related items