Font Size: a A A

Investigation On A Method Of Causal Direction Determination For Heterogeneous Data

Posted on:2022-07-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y M BaiFull Text:PDF
GTID:2518306539462454Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Discovering the causal relationship between the variables of interest from the observation data set is a problem that has recently attracted much attention of researchers.In many real scenarios,due to the influence of time lapse,multiple sources,and different data collection methods,the data set can usually be divided into sub-datasets that obey a variety of different distributions.In these different sub-datasets,the data generation mechanism corresponding to potential causal relationship will also change.Since a potential assumption is included in most existing causal discovery methods,that there is only one unique and fixed data generation mechanism,most existing methods are unable to restore the correct causal function.Therefore,they cannot obtain the correct causal relationship from heterogeneous datasets.In the causal discovery research,the distribution of cause variables and the conditional distribution of cause variables to effect variables are called causal modules.A property called modularity is that influence in a causal module will not spread to the other causal module.This property does not exist in the opposite causal direction.Therefore,the independence between possible causal modules can be tested in the two candidate causal directions to determine the causal direction between the variables in the heterogeneous dataset.The main work completed in the paper is as follows:(1)First,we formally define the problem of causal direction determination on heterogeneous data and analyze the shortcomings of existing causal direction determination methods in this problem,then we state an idea about using the modularity to determine the correct causal direction between variables in heterogeneous dataset.(2)Secondly,we define the metrics called causal module entropy that describe the distribution parameter of different types of causal modules according to previous idea.Based on the causal module entropy and the previous theoretical analysis,two causal direction determination methods for heterogeneous data are proposed,including the maximum independence of entropy method and the minimum difference method.We theoretically analyse the ability of the proposed method to determine the causal direction and then modify the minimum difference method.(3)We perform experiments on the simulation data and real datasets to verify the performance of the maximum independence of entropy method and the minimum difference method.Then,record the experimental data and draw graphs.It can be seen from the experimental results that in each group of experiments,the maximum independence of entropy method can determine the causal direction between variables in heterogeneous datasets.In the various situations included in the experiments,the minimum difference method has achieved the best performance.In most cases,its performance is much better than baseline methods and the maximum independence of entropy method.The experiment results claearly verify the effectiveness and the robustness of proposed methods.
Keywords/Search Tags:causal discovery, multi-domain, modularity, causal direction determination
PDF Full Text Request
Related items