Font Size: a A A

Using Incomplete Data For Structural Learning Of Graphical Models

Posted on:2017-06-10Degree:MasterType:Thesis
Country:ChinaCandidate:S ZhangFull Text:PDF
GTID:2349330482488260Subject:Statistics
Abstract/Summary:PDF Full Text Request
Graphical models are widely used to show and analyze causal relationships and conditional independencies among random variables. Graphical models mainly include directed acyclic graphs(DAGs), undirected graphs(UGs) and Chain graphs (CGs). Directed acyclic graphs (B aye si an networks) only include directed edges and without directed cycles, which are mainly used to describe the causal relationship between random variables. Undirected graphs(Markov networks) only include undi-rected edges, which are generally used to describe the relationship between random variables. The class of chain graphs includes both directed acyclic graphs and undi-rected graphs.Among a large number of research questions about graphical models, structural learning has caused extensive discussion. There are mainly two kinds of methods for structural learning:constraint-based methods and score-based methods. Most of structural learning approaches deal with only one database with completely ob-served data. With the development and popularity of computers, various databases have been built, which may contain different sets of variables and overlap each other. For example, in medical research, a researcher collects data of these vari-ables, another researcher may collect data of other variables, and they may contain common variables.In this paper, we propose two algorithms. One is a new algorithm for construct-ing a CG from multiple databases. The other is an algorithm for searching strong causal cut. The first algorithm is based on the multiple databases for structural learning of CGs. Firstly, we learn a local structure from each database separately. Secondly, we combine these structures together to construct a global graph over all variables. Thirdly, we delete spurious edges to construct the global skeleton. Finally, we orient the edges and construct the equivalence. This algorithm do not require conditional independence, which is a basic assumption in most methods. All the variables are separated in searching of the strong causal cut in the second algorithm. In this algorithm, each variable is assigned to the collection of A, B, C. In order to make the optimised result, we optimize the following two objectives during the assignment procedure, one is minimizing the size of C, the other is min-imizing the size difference of A and B. This algorithm is more efficient:firstly, the variables of C are reduced in the middle of the algorithm which avoids the size of C is too large, such that improve the efficiency of hypothesis testing. Secondly, the output of this algorithm is strong causal cut, which has many good properties, in the condition of strong causal cut, the DAGs have estimate collapsibility, condi-tional independence collapsibility and model collapsibility, which will improve the complexity and efficiency of statistical analysis. Finally, the hypothesis testing in this algorithm is processed in a relatively small number of variable sets, this will improve the effectiveness of constructing the large-scale sparse networks under the small sample. We prove the correctness of the algorithms under the faithfulness condition and give examples to illustrate the process of this two algorithms.
Keywords/Search Tags:Chain graph, Database, Directed acyclic graph, Causal cut, Structural learning
PDF Full Text Request
Related items