Font Size: a A A

Research On Several Kinds Of Algorithms For Supporting High-dimensional Causal Discovery

Posted on:2019-06-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:G Z MaiFull Text:PDF
GTID:1318330545496717Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the advent of large data age,causal inference algorithms have found a wide utilization in the fields of economics,Internet social Network,medical data and so on.With the increasingly growing mass data and the complexity trend of high-dimensional data structure,the problem of causal inference for processing high-dimensional data attracts much attention from experts and scholars at home and abroad.The high-dimensional data problem is a universal problem in the intelligence of the industry information,and it is imminent to solve the related problems in the field,which has become a hot topic in the field of machine learning.In this dissertation,the causal inference of high-dimensional data is studied by using the theory analysis tools such as D-separation,directed acyclic graph,graph segmentation and variable independence,conditional independence and ANM.The main contributions of this dissertation are as follows:1.Research background and significance of causal inference problem is elaborated in the introduction;and the early research results of causality inference are summed up.Then,the current research status and hotspot issues in this field are classified and described.Also briefly some of the theoretical analytical tools involved in the research of causality are introduced.Finally,the research content and significance of this dissertation are given.2.A fast partitioning method of high dimensional causal network is studied.In view of the problem that the inference speed of the causal relation of high dimensional data is not ideal,in this paper,a fast method for dividing causal variables is presented:We proposed a novel and efficient method for dividing causal variables,which can recursively divide the original dataset into several smaller datasets by using conditional independence test,at the same time it is ensured that each child data setdoes not destroy the corresponding D-separation property of the original data set.So the causal information contained in each child data can be inferred from the existing causal inference algorithm,and then a complete causal network diagram is integrated.Compared with the existing method of causal inference,this method adopts a targeted partition strategy,which improves the causal inference efficiency of high-dimensional data and guarantees the accuracy rate.3.A method based on high-dimensional causal network to infer causality is studied.In view of the fact that the accuracy of causal inference is still not high enough for high dimensional data under given causal skeleton structure,this article proposes a method to infer the causal direction:Firstly,three kinds of causal child structure are defined,namely ODS(one-degree structure),NTS(non-triangle structure),TES(triangle-existence structure),and it is proved that any high-dimensional network can be composed of the three basic structure graphs.Baesd on this,the causal inference methods of three kinds of basic structure are studied emphatically;then,the high-dimensional network is divided into several subnetworks,which corresponds to the three basic substructures,and the causal relationship between them is deduced.Finally,the complete causal network diagram is formed by merging several sub graphs and excluding redundant edges.The accuracy of this method in the causality inference of high dimensional data is significantly better than the existing method.4.A causal direction inference algorithm based on hybrid noise model and conditional independence detection is studied.Aiming at the problem of high time complexity of the algorithm proposed in chapter 3,this paper proposes an efficient method to infer the causal direction by using the V-structure property and the residual indep.endence property of the additive noise model.The algorithm is improved on the basis of the third chapter.In the learning process of causal substructure,primarily the causal direction is inferred according to the residual independent character of the additive noise model if the target substructure belongs to the ODS.Then,it is inferred based on the property of V-structure if it attributes to NTS,or it uses V-structure properties to infered the edge direction of NTS if it is part of TES.A large number of experiments show that most of the edges can be inferred from the V-structure,while the remaining part is inferred by the additive noise model in the triangular structure,and finally the several subgraphs are constructed into a complete causal network diagram after eliminated the redundant edges.This method is based on the third chapter algorithm,and the time efficiency is improved greatly.5.A large-scale causal inference method based on split-merge strategy is studied.In view of the fact that the causal inference algorithm of high dimensional data is not ideal in speed and accuracy,this paper proposed a constraint method based the normalized conditional mutual information(NCMI).Firstly,SADA method is used to segment the high-dimensional network into several subgraphs.Then for each subgraph,the condition of mutual information is used to detect the dependencies between nodes or independence to reconstruct causal undirected graph,and we use ANM to detect the direction about every node with its adjacent nodes in the causal skeleton;finally,several subgraphs are merged to form a complete causal network diagram after eliminated the redundant edges.This method has good scalability and effectiveness under high-dimensional data.
Keywords/Search Tags:causal discovery, High-dimensional data, causal subgraph, Conditional independence test, ANM
PDF Full Text Request
Related items