Using Incomplete Data For Structural Learning Of Graphical Models

Posted on:2017-06-10

Degree:Master

Type:Thesis

Country:China

Candidate:S Zhang

Full Text:PDF

GTID:2349330482488260

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

Graphical models are widely used to show and analyze causal relationships and conditional independencies among random variables. Graphical models mainly include directed acyclic graphs(DAGs), undirected graphs(UGs) and Chain graphs (CGs). Directed acyclic graphs (B aye si an networks) only include directed edges and without directed cycles, which are mainly used to describe the causal relationship between random variables. Undirected graphs(Markov networks) only include undi-rected edges, which are generally used to describe the relationship between random variables. The class of chain graphs includes both directed acyclic graphs and undi-rected graphs.Among a large number of research questions about graphical models, structural learning has caused extensive discussion. There are mainly two kinds of methods for structural learning:constraint-based methods and score-based methods. Most of structural learning approaches deal with only one database with completely ob-served data. With the development and popularity of computers, various databases have been built, which may contain different sets of variables and overlap each other. For example, in medical research, a researcher collects data of these vari-ables, another researcher may collect data of other variables, and they may contain common variables.In this paper, we propose two algorithms. One is a new algorithm for construct-ing a CG from multiple databases. The other is an algorithm for searching strong causal cut. The first algorithm is based on the multiple databases for structural learning of CGs. Firstly, we learn a local structure from each database separately. Secondly, we combine these structures together to construct a global graph over all variables. Thirdly, we delete spurious edges to construct the global skeleton. Finally, we orient the edges and construct the equivalence. This algorithm do not require conditional independence, which is a basic assumption in most methods. All the variables are separated in searching of the strong causal cut in the second algorithm. In this algorithm, each variable is assigned to the collection of A, B, C. In order to make the optimised result, we optimize the following two objectives during the assignment procedure, one is minimizing the size of C, the other is min-imizing the size difference of A and B. This algorithm is more efficient:firstly, the variables of C are reduced in the middle of the algorithm which avoids the size of C is too large, such that improve the efficiency of hypothesis testing. Secondly, the output of this algorithm is strong causal cut, which has many good properties, in the condition of strong causal cut, the DAGs have estimate collapsibility, condi-tional independence collapsibility and model collapsibility, which will improve the complexity and efficiency of statistical analysis. Finally, the hypothesis testing in this algorithm is processed in a relatively small number of variable sets, this will improve the effectiveness of constructing the large-scale sparse networks under the small sample. We prove the correctness of the algorithms under the faithfulness condition and give examples to illustrate the process of this two algorithms.

Keywords/Search Tags:

Chain graph, Database, Directed acyclic graph, Causal cut, Structural learning

PDF Full Text Request

Related items

1	Structural Learning Of Directed Acyclic Graphs With Latent Variables By The Separation Tree
2	Study Of Ceramic Raw Materials Based On Multivariate Statistical Graph Analysis
3	The Influence Mechanism And Effect Of Monetary Policy On Chinese Residents' Consumption
4	Methods Of Detect Falsely Making Out Specialized VAT Invoices Behavior Based On Directed Graph
5	Research On The Multi-tasks Scheduling Problem
6	Interactions Between The Various Regions Of Beijing, Tianjin And Hebei Based On DAG
7	Study On The Spillover Effects Of U.S. Monetary Policy On China's Stock Market
8	Research On Prices Fluctuation And Transmission Of Chinese Hog Market Under Multi-scale Perspective
9	Analysis Of The Differences In The Inter-provincial Spillover Effect Among The Manufacturing Industries In Beijing-Tianjin-Hebei
10	The Study On Transmission Effects Of Intermediate Targets Of The Monetary Policy Based On DAG-SVAR