| Inferring the causal relationships from observed data is an important research course for machine learning and statistics field.With the increase of causal inference application areas,learning causal relationships from observed data in high-dimensional data is still a difficult research problem.Causal theory reveals useful information hidden in the observed data.It pioneers new thinking and new directions for machine learning research.Machine learning methods mostly are lack of interpretability and causality discovery is considered as a modeling tool to make machine learning interpretability.Based on the above two problems,The main ideas of this research work are as follows:Based on the problem of high-dimensional data in observational data,and inspired by the PC algorithm inference process based on the conditional independence test,a causal inference algorithm is proposed with strong applicability and still maintains good performance in high-dimensional data.The algorithm constructs a causal network in two steps,the causal skeleton construction stage and the causal direction recognition stage.In the causal skeleton construction stage,a method named m RMR in this stage to search for the dependency level between data incrementally.It is aimed to seek the set of factors that have the greatest dependency on the target node.Then we use the conditional independence test method to remove candidate factors The irrelevant nodes of the set,iterative execution finds the real causal skeleton.Finally,in the causal direction identification stage,the IGCI algorithm is used to determine the causal direction between nodes.Then the complete causal network structure diagram can be inferred.The comparison with other causal inference methods shows that the algorithm outperforms with other causal inference methods in terms of computation time and accuracy.Nowadays,lung cancer has the highest incidence rate and death rate among malignant tumors,which is a serious threat to human health.How to accurately evaluate the survival analysis of cancer patients is the current research hotspots in Clinical Medicine.Recently,machine learning has obtained significant results in medical data,but in many cases the model lacks the ability to interpret the results,and the model stability is poor.Based on this,the causal inference algorithm(MRCI)proposed in this paper as a factor selection method is combined with a deep neural network to predict the survival time of lung cancer patients with available data from local hospitals.This method uses a causal inference algorithm to construct a causal network structure diagram of pathological factors and patient survival time,and filters the main factors from the causal network structure diagram.Next,predict survival time of lung cancer patients with available data by building DNN model.From the data and experiments,we know that lung cancer stage,radiotherapy,smoking,PLR,lung cancer type and NLR are the main factors that directly affect the survival time of lung cancer patients.Experimental comparison finds the causal inference-based approach to screen the main factors applied to DNN prediction is better than other factor screening methods. |