Font Size: a A A

Research And Applications Of Causal Discovery Algorithms For Functional Causal Models

Posted on:2022-10-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y ZengFull Text:PDF
GTID:1488306317994139Subject:Computer applications engineering
Abstract/Summary:PDF Full Text Request
In recent years,causal discovery based on observable data has caught the attention of many researchers and has been broadly utilized in financial economics,neuroinformatics,biomedicine,and other fields.Causal discovery based on the Functional Causal Models(FCM)is one of the research hotspots,since it enables us to guarantee the unique identification of the causal structure,and solves the Markov equivalence problem in causal discovery.It effectively helps solve problems such as data source interpretation,intervention measures formulation,and multi-domain learning in various real-world applications.This research includes the following two tasks:one is(?)learning the causal relationships between observable variables while the other is(?)between hidden variables(or latent factors).However,for different tasks,causal discovery for different functional causal models still encounters problems:the causal discovery algorithm based on linear non-Gaussian FCM and task(?)is not effective or efficient due to the non-optimal search method of causal ordering;the causal discovery algorithm based on the nonlinear FCM and task(?)lacks the identification theory of the causal networks due to the complexity and intransmissibility of the nonlinear models;the causal discovery algorithm based on the latent-variable FCM and task(?)cannot handle multi-domain data,and there are problems such as insufficient representation ability of the model,etc.Therefore,in response to these problems,this article starts from different types of functional causal models and launches in-depth exploration and research on models,theories,algorithms,and applications.Specifically,the research content and main contributions of this article are:(1)For the challenge of the causal discovery algorithm for linear functional causal models with low identification accuracy and high computational complexity,we propose a LiNGAM algorithm of Giving Priority to Leaf-nodes,namely GPL,with the causal tree assumption.It preferentially selects leaf nodes to estimate the causal structure between observable variables.According to the feature that leaf nodes will not affect other nodes in the causal structure,the GPL algorithm directly selects leaf nodes one by one in a bottom-up manner;determines the causal ordering without performing other operations such as data update process;and finally uses pruning methods to obtain the final structure.In addition to the efficient algorithm design,we also present a theoretical analysis of the algorithm in terms of accuracy and complexity.Experiments on synthetic data and real data on wireless network optimization confirmed the superiority of the GPL algorithm in terms of computational complexity and accuracy,especially when the number of variables is large or the sample size is small.(2)For the challenge that the causal discovery algorithm for non-linear functional causal models lacks the theoretical guarantee of the causal network identification,we first establish the non-linear functional causal model i.e.,multiple High-Dimensional Deterministic Model,HDDM.It is based on multiple high-dimensional variables and we analyze its good properties:After the data are transformed,any two dependent variables can gain the asymmetry.This kind of asymmetry is a significant basis for deriving the identification guarantee of causal networks.Next,we design two selection rules for candidate parent nodes,and based on these two rules,we accordingly propose an integrated causal discovery algorithm with HDDM.We give a consistency analysis of the algorithm.Finally,through experiments on simulation data,we verified the correctness of our theory.We also applied the algorithm to the Yahoo stock indices data set to discover the causal information behind the stock data.(3)To further explore the causal relationships behind data and enhance the representation ability of the model,we focus on the causal relationships between hidden variables.To solve the challenge that algorithms for latent-variable functional causal models can not handle multi-domain data,we first establish a functional causal model based on multi-domain latent variables,i.e.,Multi-Domain Linear Non-Gaussian Acyclic Models for LAtent Factors,which is abbreviated as MD-LiNA.And we provide its identification results MD-LiNA is a new causal representation model.Besides,we propose an integrated two-stage method.Specifically,in the first stage,we use the so-called Triad constraints based on the pseudo residuals and factor analysis methods to locate the latent factors in all domains Then we estimate the factor loading matrix(connecting observable variables and latent factors).In the second stage,based on the independence of noises and the dependence between the latent factors in different domains and the latent factors of interest,we establish our objective function and some constraints like the acyclicity constraint.MD-LiNA effectively solves the problem of multi-domain data,that is,when the latent factors are multi-domain,how do we estimate the shared causal structure between latent factors of interest,in accordance with the different causal structures between latent factors in different domains?Experiments on the synthetic data and the fMRI hippocampus real data showed that our method can efficiently deal with not only single-domain data but also multi-domain data.In general,this article aims at constructing causal discovery algorithms with high efficiency and strong generalization ability;takes two different tasks as the direction with the linear non-Gaussian FCM,nonlinear FCM,and hidden-variables-based FCM;and solves the problem of causal discovery for FCMs.Our research plays a significantly guiding and reference role in different practical applications.
Keywords/Search Tags:Causal Discovery, Functional Causal Models, Causal Ordering, Observable Variables, Hidden Variables
PDF Full Text Request
Related items