Font Size: a A A

Bounding Causal Effects For Fusing Multi-source Databases:A Methodological Study

Posted on:2024-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y N HeFull Text:PDF
GTID:2544306923972239Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Background:In observational studies,the presence of confounding factors can cause biased estimates of the causal effects of exposure on outcomes.Researchers should collect and utilize as many confounding factors as possible to explore the true causal effect.Although the increase in the number of medical databases provides opportunities for collecting information on confounding variables and controlling confounding bias,existing data fusion methods usually assume that one database contains all the confounding information,which is difficult to meet in the real world.Therefore,causal inference research is moving towards relaxing the strong assumptions of causal inference methods on observational study data by solving the upper and lower bounds of causal effects.Currently,causal effect bounding methods mainly include sensitivity parameter-based methods and optimization problem-solving methods.However,these methods only calculate causal effect bounds for a single dataset and do not fully utilize the confounding information from other datasets through data fusion to explore causal effect bounds.Therefore,when multiple observational study databases contain different confounding information,deriving causal effect bounds based on data fusion is a new way to fully utilize the information from each database for causal inference.Methods:This study addresses the issue of different confounding factors and their correlations across multiple observational datasets.First,theoretical derivation is used to prove the inconsistency of estimating causal effect bounds without considering the correlation bounds between confounding variables.Further,a method for fusing multiple databases,called Causal Effect Bounds based on Data Fusion(DF-CEB),is proposed based on the combination of theoretical derivation,statistical simulation,and real data analysis.Specifically,assuming the existence of two observable datasets:Dataset 1(including exposure,outcome,and confounding 1)and Dataset 2(including exposure,outcome,and confounding 2),with correlations between confounding variables.Two causal graph models are constructed to correspond to two different situations:(1)the two datasets in the continuous variable model do not contain common confounding variables;(2)the two datasets in the continuous variable model contain common confounding variables.Then,based on the causal graph model and the basic principles of causal inference such as the do-operator,this study theoretically deduces the expressions of causal effect under the correlation coefficients between confounding variables in the two situations.The DF-CEB method is then constructed using the boundary theory of correlation coefficients.Furthermore,this study uses statistical simulation to generate data under different scenarios,traverses the parameters of the model,and uses the proposed method to calculate causal effect bounds and compare them with crude causal effect bounds.The evaluation indicators include the coverage rate of causal effect bounds,the width of causal effect bounds,and the proportion of exclusion of zero effect values.Finally,the proposed method is applied to explore the causal regulatory relationship between blood glucose and blood lipids,further validating the feasibility of the DF-CEB method proposed in this study.Results:Theoretical derivation results:This study addressed the issue of different confounding variables in different observational research datasets,and the correlation between these variables.Specifically,the study(1)demonstrated the inappropriateness of calculating causal effect bounds without considering the correlation between confounding variables,and(2)utilized the complementarity of confounding variables across datasets based on the assumptions of homogeneity,potential exchangeability,and linear additive structural models to construct two types of DF-CEB methods in two different scenarios.Simulation results:With other parameters fixed as the initial values,the sample size and effect parameters of each edge of the causal graph were respectively traversed.The simulation results showed that the causal effect bounds obtained by the DF-CEB method effectively covered the true causal effect values.At the same time,the size of the dataset sample did not affect the accuracy of the method proposed in this study.Application results:The results suggest that blood glucose has a positive causal effect on blood lipids,which is consistent with recent research findings.Conclusions:This study is based on the fundamental theory of causal inference and demonstrates the irrationality of obtaining causal effect bounds without considering the correlation bounds of confounding variables.Subsequently,the study utilizes the complementarity of variables among multiple data sources and constructs two DF-CEB methods for two different situations.(1)In a continuous variable model,when confounding factors between exposure and outcome are distributed in different databases and there is correlation among confounding variables,ignoring the tighter bound of the correlation coefficient among confounding variables and directly calculating the causal effect bounds of exposure on outcome will lead to the inaccuracy of causal effect calculation,which is unreasonable.(2)Based on the assumptions of homogeneity,unconfoundedness,and linear additive structural models in causal inference theory,this study combines the ideas of data fusion and causal effect bounds to propose the DF-CEB method for causal effect bounds estimation using multiple databases.Our results demonstrate that this method can provide accurate causal effect bounds.(3)The coverage probability of the DF-CEB method proposed in this study is not affected by the size of the database sample or the correlation coefficient between confounding variables in both situations,demonstrating strong robustness and reliability.In addition,the performance of this method for excluding the proportion of zero effect values is correlated with the size of the true causal effect values.When the true causal effect value is large,the proportion of zero effect values excluded is higher,and the performance of the DF-CEB method is better.(4)The DF-CEB method proposed in this study was applied to explore the causal regulatory relationship between blood glucose and blood lipids.The actual data analysis results showed that after controlling for various confounding factors such as BMI,age,blood pressure,and smoking years,there was a positive causal regulatory relationship between blood glucose and blood lipids.
Keywords/Search Tags:Causal inference, Confounding control, Data fusion, Causal inference boundary
PDF Full Text Request
Related items