Font Size: a A A

Online Streaming Feature Selection Algorithms:from Correlation To Causality

Posted on:2022-12-28Degree:MasterType:Thesis
Country:ChinaCandidate:L Z LiFull Text:PDF
GTID:2518306773467944Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
The development of the Internet and information technology has brought about the explosive growth of data.In the face of such high-dimensional and redundant data sets,it is crucial to obtain concise and reliable information.As an effective data preprocessing method,feature selection can select the feature subset with the most identifying information,which has attracted extensive attention.However,traditional feature selection algorithms require the feature space to be static,i.e.get all the features at the beginning,which is obviously inconsistent with the practical application field.For example,new hotspots emerge all the time,and old topics become obsolete.The emerge of new hotspots and the demise of old topics all show that the feature space is dynamic rather than static.The non-determinacy and evolution of feature space give rise to the prosperity of online stream feature selection algorithms.The streaming feature means that the feature flows into the feature space in turn according to the time series,while the sample space remains unchanged,and the feature is processed in an online manner.At present,online streaming feature selection algorithms mainly focus on mining the correlation between data,that is,the statistical relationship between phenomena.However,the correlation only reveals the surface relationships of the data,and there contains a deeper essential relationship-causality.Discovering the hidden causality in the data can help to build a more robust and interpretable classification model,which is more appropriate for the needs of the real-world applications.In this thesis,aiming at some shortcomings of the existing online streaming feature selection algorithms,we start with the correlation,then mine the latent causality in the data,and explore the online streaming feature selection algorithm: from correlation to causality.The main research contents are as follows:(1)Online streaming feature selection algorithm using neighborhood information interaction.Traditional online streaming feature selection algorithms only consider the correlation between features and labels,ignoring the interaction between features.To address the issue,a new online streaming feature selection algorithm using neighborhood information interaction is defined by the concept of feature interaction.The process consists of two stages:online interaction analysis and online redundancy judgment.In the online interaction analysis stage,features with strong interaction ability are selected.Then,redundant features in the currently selected feature subset are removed in the online redundancy judgment stage.Consequently,a feature subset with strong interaction and weak redundancy is obtained.Experimental results on 10 datasets verify the effectiveness of the algorithm.(2)Causality-based online streaming feature selection with neighborhood conditional mutual information.Most traditional online streaming feature selection algorithms only concentrate on the correlation between data and ignore the underlying causality.To deal with this issue,based on causality and Bayesian network,a new online streaming feature selection algorithm using neighborhood conditional mutual information is proposed.The algorithm uses causality to find the Markov Blanket of class labels.Then,in order to obtain the theoretical optimal feature subset,the neighborhood conditional mutual information is used to replace the conditional independence tests to remove the false positives.Finally,we construct a more interpretative and robust classification model.Experiments on 13 datasets show the superiority of the algorithm.
Keywords/Search Tags:online streaming feature selection, neighborhood mutual information, correlation, causality, markov blanket
PDF Full Text Request
Related items