Font Size: a A A

Research On Causal Inference Method For High-dimension Missing Data

Posted on:2021-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:H F LiFull Text:PDF
GTID:2428330602488600Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of artificial intelligence and big data,discovering causality from observed data is an important issue in many research fields.Causal inference is a powerful modeling tool for interpretive analysis,making current machine learning interpretable.It has important applications in many fields such as medical,communications,Internet,statistics and economics.At present,the causal relationship inference has begun to study the learning of the causal network structure in high-dimensional data based on the study of two-dimensional variables.However,using traditional causal inference algorithms to learn the causal network structure and improve the accuracy of learning in high-dimensional data is the difficulty of current research.In complex high-dimensional data,there are often a lot of missing and abnormal data.If the processing is not good,it will directly affect the accuracy of causal inference.Based on the above problems,we gradually improve the causal inference algorithm for high-dimensional data according to the two-part research ideas.The two parts of the research ideas and innovations are:1.Based on the problem of outliers in high-dimensional data,we introduced a two-step causal inference algorithm suitable for high-dimensional data based on the introduction of the coupling correlation coefficient(CDC).First,the algorithm introduces a CDC that is robust to outlier data,detects the correlation between variables,improves the accuracy of the parent-child node set of the target point,and then uses the conditional independent test(CI)to further refine the parent-child set point set.Delete irrelevant nodes;then use nonlinear least squares independent regression algorithm to mark the cause and effect direction between the target point in the graph and its parent and child nodes;finally iterate all the nodes to complete the complete causal network structure.2.Some missing high-dimensional data directly executing the existing causal inference algorithm may lead to incorrect inference.In recent years,deep learning techniques for filling missing data have become increasingly mature and reliable.Based on this,we combine the two frameworks of GAN and GAE in deep learning,which are used to perform iterative missing data filling and causal skeleton learning,respectively.Experimental results show that the algorithm improves the accuracy of causal network structure learning under high-dimensional data.At the same time,in large sample data sets,the time complexity of the algorithm is better than traditional algorithms,and it is robust to outliers.Through the simulation of synthetic data,we also prove that the causality inference performance under different missing data mechanisms is better than the existing methods.
Keywords/Search Tags:causal inference, high-dimensional data, abnormal-missing, CDC, deep learning
PDF Full Text Request
Related items