Feature selection reduces data dimensionality by filtering irrelevant and redundant features in high-dimensional data and has become an essential pre-processing aspect in data mining and causal discovery,especially in the field of streaming data.However,classical feature selection methods lead to the weak interpretability of causal relationships by the mined features due to the main focus on the correlation between predictive features and target features(class labels).The clarity of causality enhances the interpretability and predictability of predictions.Causal feature selection seeks to identify the Markov blanket(MB)of target features in the Bayesian network(BN),which includes the direct causes(parents),direct effects(children),and other parents of the child(spouse)of the feature.This is beneficial for improving classification and prediction performance and crucial for local causal discovery around the target features of interest.This paper mainly researches causal feature selection and local causal discovery methods for fixed and streaming data.The main contributions include the following.Firstly,the existing causal feature selection methods,which process and handle the scenario of fixed data,cannot model and consider the improvement and balancing of both computational efficiency and prediction accuracy using low-to high-dimensional data.To overcome these issues,the proposed novel method called Feature selection via Mining Markov blanket(FSMB)improves the prediction accuracy and efficiency and creates a balanced route between effectiveness and compromise.FSMB is intended to mine MBs containing Parents and Children(PC)and spouses using a forward method to induce true positive PCs for a given target feature T.FSMB removes false positive PCs from the PC set and disregards them.At the same time,FSMB finds the spouses of the mined target feature target T by performing a thorough search from the non-PC set using a V-structure strategy.Secondly,the existing feature selection methods for streaming data can only mine approximate MB,and the mining of spouses is incomplete.Based on the MB,the online feature selection method called Online Feature Selection Via Markov Blanket(OFSVMB)is proposed.A null-conditional test motivated it to address the streaming features,feature relevance analysis to find the true positive PC and spouses,and feature redundancy analysis to remove the false positive/irrelevant features to achieve real-time MB mining from stream features.Thirdly,the existing local causal discovery methods do not apply to high-dimensional data in dynamic feature space.To address the above problem,a local causal discovery method called Local Causal Discovery(LoCaD)is proposed to handle fixed and streaming data.LoCaD achieves real-time differentiation of parents and children and PCs and spouses of target features by dynamically integrating V-and N-structures.In addition,LoCaD considers the improvement and balance of computational efficiency and prediction accuracy.Finally,for effectiveness verification of the FSMB,OFSVMB,and LoCaD,spam filtering,yeast cell downscaling,and lung cancer of high-dimensional data are used,respectively. |