Font Size: a A A

A Study On Recursive Structure Learning Based On Recursive Divide-and-Conquer Algorithm

Posted on:2022-12-01Degree:MasterType:Thesis
Country:ChinaCandidate:W H ZhangFull Text:PDF
GTID:2518306782452544Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
Inferring causal relationships between variables from observational data is a hot topic in data science.With the continuous progress of science and technology,financial economy,social networks,smart medicine,big data smart city and other fields have generated massive and complex data through Internet interaction.How to effectively mine valuable information from these observational data and explore potential causal mechanisms in data plays an important role in the interpretability of science research and assisting decision-making,which promotes research progress in many scientific fields,such as biomedicine,social networks and finance.Recently,there have been a lot of studies on causal structure learning from observational data.For example,the constraint-based method with conditional independence test,under the causal faithfulness assumption,starts from a completely undirected independent graph and conducts conditional independence test by traversing the search condition candidate sets to perform edge deletion operation.Finally,the causal directions of other undirected edges are determined by V structure and some orientation rules,and a causal structure diagram is obtained.However,most of these methods may suffer from two problems in high-dimensional observable data: 1)They are only applicable to the case where the data set is large enough,and the efficiency of causal structure learning in high-dimensional data is low.2)They cannot completely distinguish Markov equivalence classes and can only obtain a partially directed causal graph.In the massive data of real scenes,the causal structure learned by such methods is unreliable.In order to reduce the difficulty of causal structure learning and improve the accuracy of causal relationship with high-dimensional data,a causal decomposition reconstruction algorithm(CADR)based on recursive divide-and-conquer methods is proposed in this study under the assumption of nonlinear non-Gaussian data,which effectively solved the problem of causal structure learning under high-dimensional small samples and improved the efficiency and the accuracy of causal structure learning.Specifically,the work of this study is as follows:(1)In the terms of learning efficiency of high-dimensional data structure,according to the central idea of recursive divide and conquer method,the high-dimensional data set are recursively decomposed into many smaller subsets until it can no longer be decomposed,or the size of the subset reaches the threshold.In this process,the algorithm reduces the space of the variable set,that is,the search space of the conditional candidate set in the conditional independent test and reduces the number of the conditional independent test.Thereby,it improves the efficiency,and obtains a more accurate causal structure skeleton in the lower dimension of the conditional candidate set.At the same time,to learn the causal skeleton more quickly and accurately under nonlinear data,CADR algorithm introduces the kernel conditional independence test method using random Fourier feature approximation.(2)In order to solve the problem that traditional constron-based methods cannot identify Markov equivalence classes,under the assumption of nonlinear non-Gaussian data,the causal direction of Markov equivalence classes can be effectively identified in the non-decomposable subset based on the irreversibility and identifiability of non-linear non-Gaussian model.Experiments on simulation data and real causal structure data show that CADR algorithm can not only improve the efficiency of learning causal structure of high-dimensional data,but also distinguish Markov equivalence classes and identify the causal direction effectively.The causal structure learned by CADR is accurate and reliable.
Keywords/Search Tags:Causal relationship, Recursive decomposition, High-dimensional data, Markov equivalence classes
PDF Full Text Request
Related items