Font Size: a A A

Causal Inference Method Of Mixed Data Based On Attribution Index And Its Application

Posted on:2022-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhouFull Text:PDF
GTID:2518306344496904Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid accumulation of massive data,it is particularly urgent to mine the underlying causal mechanism from the data.In recent years,the study of causality between things has made a series of cross-field progress,such as clinical medicine,genomics,epidemiology,space physics and other fields that have received extensive attention.As a basic task of data science research,causal inference plays an important role in interpretation,prediction,decision-making and control.The data collected in real scenes are often mixed data types,and there are potential variables not observed.However,most of the proposed causal reasoning frameworks can not infer in the presence of hidden variables,and only consider a single data type.Such a causal reasoning algorithm applied to the real world is easy to identify the false association errors as causality,which leads to the decrease of the credibility of the final learning causal network structure.Based on this,the research contents of this paper are as follows:1.In order to solve the problem of causal discovery of mixed data types and how to identify potential variables,this paper proposes a causal inference algorithm(NCICR)for mixed data types based on the introduction of mixed data set independence test and normalized causal index(NCI).On binary mixed variables,the algorithm improves the accuracy of recognition in the direction of causal inference of binary variables by calculating the normalized causal index of algorithmic information theory.Based on FCI algorithm,the algorithm combines conditional independence test of mixed data with attribution index.In neighbor algorithm,starting from a complete undirected graph,the algorithm performs independence test of mixed data set,removes irrelevant edges between two nodes,and obtains initial undirected graph.Then,the minimum description length is used to approximate Kolmogorov complexity,The adjacent nodes of variables are searched to estimate the size of attribution index of X ? Y or Y ? X in two directions.The causal direction is identified by comparing the magnitudes of the two estimated values until all nodes are identified,so as to obtain the correct causal network structure diagram,which is tested on simulated data and real data.The experimental results show that the NCICR algorithm proposed in this paper can be compared Good handling of causal inferences for mixed data types containing confounding factors.2.In order to further verify the applicability of NCICR algorithm proposed in this paper in the actual scene,most medical statistics only process data through correlation analysis method,which can not identify the false association caused by potential confounding factors,leading to the wrong identification of causal relationship.In this paper,the proposed causal inference algorithm is applied to the field of medical prognosis treatment.The causal relationship between the observed real data sets of lung cancer patients and ovarian cancer patients is studied in detail,and the causal structure network of case data is obtained.The research results show that the algorithm proposed in this paper is consistent with the conclusions drawn by conventional statistical methods in terms of correlation.In terms of causality,the causal inference algorithm based on NCICR has obtained the causal relationship network between the variables in the medical record data,which is clear Causal direction,and a certain recognition of latent variables.From the perspective of causal inference,the causal relationship modeling of factors affecting the prognosis of cancer treatment can improve the accuracy of judging the prognosis of patients,and it also provides a new method for causal identification of medical observation data.
Keywords/Search Tags:causal inference, normalized causal indicator, Mixed data, confounding factors, cancer
PDF Full Text Request
Related items