Font Size: a A A

Causal Discovery On High Dimensional Data

Posted on:2016-07-04Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhangFull Text:PDF
GTID:2308330461455978Subject:Mathematics
Abstract/Summary:PDF Full Text Request
Causality is one of the fundamental problems in natural sciences. Although a lot of researchers are committed to find out the causal relationship between things, existing causal discovery algorithms are usually not effective and efficient enough on the high dimensional data. Because the high dimensionality reduces the discovered accuracy and increases the computation complexity. To alleviate these problems, we present a three-phase approach to learn the structure of nonlinear causal models by taking the advantage of feature selection method and two states of the art causal discovery methods. In the first phase, a greedy search method based on Max-Relevance and Min-Redundancy is employed to discover the candidate causal set, a rough skeleton of the causal network is generated accordingly. In the second phase, constraint-based method is explored to discover the accurate skeleton from the rough skeleton. In the third phase, an information-geometric direction learning algorithm IGCI is conducted to distinguish the direction of causalities from the accurate skeleton. The experimental results show that the proposed approach is both effective and scalable, particularly with interesting findings on the high dimensional data. To be specific,our study content and the innovation points of this topic are listed below:(1) The high dimensionality reduces the discovered accuracy and increases the computation complexity on high dimensional discovery. To alleviate these problems, we propose a greedy search method for causal candidate skeleton discovery by taking the advantage of Max-Relevance and Min-Redundancy (mRMR) criteria, it has high reliability and robustness even with small dataset in high dimensional case.(2) In the second phase, constraint-based method is explored to discover the accurate skeleton from the rough skeleton. In order to learn a better causal framework even on a nonlinear data, in this paper we employ the kernel-based conditional independence test method proposed by Kun Zhang, instead of the conventional independence test methods.(3)According to the data attribute is non-linear, we use an information-geometric direction learning algorithm IGCI proposed by Janzing to distinguish the direction of causalities from the accurate skeleton. IGCI algorithm Break the symmetry between the variable as a result, the model can essentially identify the causal relationship between variables, solve the traditional bayesian network approaches in section can only identify the causal relationship between variables. With the accurate skeleton obtained in the first two phases, the direction learning algorithm can identify the causal relationship between the variables under high dimensional data.The proposed algorithm is tested on simulation data, real networks and real-world data to prove its effectiveness and is compared to two states of the art causal discovery methods. Experimental results show that the proposed algorithm is effective and stable in solving the problem of large-scale causal recognition.
Keywords/Search Tags:Causal relationship, High dimensional data, Causal network
PDF Full Text Request
Related items