Font Size: a A A

Study Of Causal Relationship Discovery Using Constrain-based Method From Observational Data

Posted on:2015-02-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z JinFull Text:PDF
GTID:1268330425994712Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
The essence of human exploration of natural is to find the cause of a variety of phenomena, which can be used to explain various phenomena and help us to understand and master the laws of nature. A well-designed randomized controlled experiment is the most effective way to identify causality, which however may be infeasible to be conducted for restrictions of cost and ethics. Massive amounts of observational data have being generated from all kinds of areas with the fast development of data collection and storage technology, such as engineering, medical and scientific. People can turn to discover causal relationship based on observational data using these methods when the random experimental method cannot be carried out. The causal discovery methods based on observational data are designed to reveal the causality embedded on the vast data.It is very important and challenging to discover causal relationships from large databases of observational data. There is no generally acceptable definition for causal relationships, which may have different meanings in different areas, making it difficult to be expounded in a unified form. Causal Bayesian network is an important causal model, which can visually represent the causality embedded in observational data under causal sufficiency condition. However, an ideal causal graph usually cannot be constructed without causal sufficiency condition hold in all the observed variables. In addition, the cost of learning a causal Bayesian network exponentially increases with the number of variables, making the learning of a full causal Bayesian network infeasible with large number of variables. In practical, people may not interest to know all the causal relationship among the variables, therefore finding causal structures on a subset of the variables shows much greater significance. To address the problem in learning and using of current causal models, this dissertation aims to learn causal relationships from a subset of the variables underlies the big content of causal discovery from observational data. The major content of this dissertation are as follows:1. This dissertation studies various constrain-based methods for causal discovery, and proposes a new method based on persistent constrain for the calculation of a large number of conditional independence in current constrain-based methods. The method integrates the partial association of a variable under distinct covariates, which avoiding to measuring the conditional independence. On the basis of persistent constrain, an improved method based on equivalence class is proposed, which effectively reduce the cost of conducting the persistent constrain. A formal definition of direct causal relationship is given, and formalized using causal rules based on extended default logic. It provides concise syntax and formal expression for direct causal relationship, and can be applied to structure models for causal prediction and causal diagnosis.2. It is of great practical significance to find causal rules from observational data based on the foundational theory of direct causal relationship, since causal rules are effective expression of direct causal relationship. Causality is introduced into the interestingness measure of association rules and a new method for evaluating the causal rules using information entropy is proposed for the shortcomings and deficiencies of traditional interestingness measures. The interaction of different association rules are considered as prior knowledge to eliminate false and fake causal rule and find real causal rules using the interestingness measure. An algorithm is implemented on a real data and shows efficiency in mining causal rules from association rules. Causal rules discovery based on association rule provides a feasible solution to find causal relationship from observational data by taking full advantage of techniques in associate rule mining.3. As the causal Bayesian network is too complex and difficult to learn and infer the causality, this dissertation presents a general framework for direct causal relationship discovery based on formalized causal rules. With layered means, the framework applies positive association and zero partial association to determine the causal rules, which sticking to the thought of persistent constrain. Causal relationship of single variables is extended to combined case, which has not been solved in traditional causal discovery methods. An efficient algorithm is presented according to the framework, and it improves the efficiency of mining causal rules using a well designed ordinal equivalence storage table and ordinal local equivalence storage table, as well some pruning technology. Through numerous experiments, the algorithm shows well performance and improves the efficiency of causal rules mining in relative to current methods.Causality discovery is one of the most important topics in the field of knowledge discovery. Constrain-based methods for causal relationship from the perspective of observational data are studied in this dissertation, the formalization and inference of causal rules are discussed, and the practical mining of direct causal relationship from massive observational databases are explored. Those systematic works may be helpful for the research of theory and practice on causal discovery from observational data.
Keywords/Search Tags:causal relationship, causal rule, partial association, observational data, conditional independence, association rules, Bayesian network
PDF Full Text Request
Related items