Font Size: a A A

Causal Discovery Based On Structural Equation Model

Posted on:2014-07-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:J YangFull Text:PDF
GTID:1268330398979831Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Structural equation model (SEM), which is a continuous Bayesian network model, has drawn growing interests from researchers worldwide. To improve accuracy and efficiency of its structure learning we developed several algorithms for datasets following arbitrary distribution, i.e. Gaussian distribution or non-Gaussian distribution. To extend its usages, the algorithms developed in this thesis are applied to health field. The main contributions of the dissertation are as follows:1. The accuracy and efficiency of current linear structural equation model (SEM) structure learning can be improved. To achieve this goal, we first proved that the partial correlation can be used as the criterion of the condition independence (CI) test for the datasets that are generated by a linear SEM whose disturbances are uncorrelated and follow arbitrary distribution, regardless of whether these datasets follow a multivariate Gaussian distribution. Then combining local learning with partial correlation, we proposed a partial-correlation-based Bayesian network structure learning algorithm, called PCB. To significantly reduce the search space, the PCB algorithm selects potential neighbors (parent nodes and children nodes) with partial correlation to reconstruct the skeleton of a Bayesian network. To obtain a final causal structure, the PCB algorithm performs a constrained greedy hill-climbing search to orient the edges. At last, we demonstrate the effectiveness of the PCB algorithm from both theoretic analyses and simulations. On the other hand, choosing the optimal threshold of the partial correlation requires large amount of experiments, and a more efficient method is desired for this task.2. As is mentioned in chapter3that the optimal threshold of PCB algorithm has to be selected from large amount of experiments. To tackle this problem, we combine local learning with simultaneous equations model techniques and propose a new causal structure learning algorithm, called BSEM algorithm. For datasets generated by the linear SEM, we illustrate that the coefficients of the simultaneous equations model can be used to measure the impact of the variable. BSEM algorithm selects potential neighbors based on the coefficients of the simultaneous equations model to reconstruct the skeleton of a Bayesian network. Hence, the search space can be reduced greatly. We then perform a constrained greedy hill-climbing search to orient the edges and obtain a final causal structure. We demonstrate the effectiveness of the PCB algorithm from both theoretic analyses and simulations. A simulation study shows that the BSEM algorithm has higher accuracy and better time performance. The optimal threshold selection of BSEM algorithm based on hypothesis testing methods overcomes the shortcoming of the threshold selection method of PCB algorithm, however, time performance of BSEM algorithm declines slightly. With a known knowledge on a node ordering of variables, an algorithm based on Recursive Simultaneous Equations Models, RSEM is proposed. The RSEM algorithm uses the prior knowledge on the node ordering and selects parent nodes from the preceding nodes of the destination node based on equations coefficients. The RSEM algorithm achieves a high accuracy, and the time performance of the algorithm is significantly improved.3. Based on the PCB algorithm, an improved partial-correlation-based Bayesian (IPCB) network structure learning algorithm is proposed. For datasets generated by the linear SEM, the number of samples is not a small case, partial correlation coefficient statistic submit the t-distribution. The IPCB algorithm combines the method of hypothesis testing and partial correlation coefficients to select candidate neighbors of the target node. A greedy hill-climbing search is performed in a constrained space to obtain the ultimate causal structure. We illustrate the effectiveness of the algorithm theoretically and experimentally. A simulation shows that the IPCB algorithm has higher accuracy and better time performance. The IPCB algorithm can solve threshold selection problem of the PCB algorithm, and makes up for the lack of time performance of the BSEM algorithm.4. Based on causal discovery techniques we conduct application research in the field of health. Causal discovery algorithms are applied to cross-sectional survey data to reveal potential causal link. Using the LIMB algorithm, TC algorithm, PCB algorithm, BSEM algorithm, and the IPCB algorithm, we analyze causal relations in the real cross-sectional data from the National Health and Nutrition Examination Survey (NHANES). Experimental results show that these algorithms can discover causal relations which may help better understand biological mechanisms and serve medical research.
Keywords/Search Tags:Bayesian network model, linear structural equation model, structure learning, locallearning, partial correlation
PDF Full Text Request
Related items