| The goal of variable selection methods is to enable the model to maintain high predictive accuracy while finding the independent variables that are related to the dependent variable.This process not only helps to reduce the computational complexity of the model,but also improves its generalization ability,making it perform better on unknown datasets.In this paper,a new variable selection method is proposed based on the variational information bottleneck theory.Information bottleneck(IB)aims to find the optimal representations of input variables with respect to response variable.While it has been widely used to the machine learning community,research from the perspective of information-theoretic method has been rarely reported on variable selection.Motivated from the idea that finding the optimal representations is essentially to eliminate the useless information,which is consistent with variable selection.In this paper,we propose to investigate Deep Neural Network for variable selection through an information-theoretic lens.To be specific,we first state the rationality of variable selection with IB and then propose a new statistic5)to measure the variable importance using Drop-Out-One Loss method.On this base,a new algorithm based on deep variational information bottleneck(Deep VIB)is developed to calculate the statistic,in which we consider the Gaussian distribution and the exponential distribution to estimate the Kullback Leibler divergence.Based on the proposed algorithm in this paper,we conducted 20 repeated experiments on four simulated datasets.The experimental results show that in most cases,our algorithm can effectively select all variables and perform better compared to most of the reference algorithms.The experimental results on the UCI real dataset demonstrate that our algorithm can identify the added pseudo variables as the least relevant ones,and is more stable than most of the comparative algorithms.Subsequently,the proposed variable selection algorithm is applied to the nickel-based superalloy dataset and the cast superalloy dataset.The experimental results on the nickel-based superalloy dataset show that the content of elements Nb and Ta has a significant influence on the area fraction of precipitates at different temperatures;while the experimental results on the cast superalloy dataset show that the content of elements W and Al as well as the environment temperature have a significant influence on the tensile performance,and the content of elements Cr and Al as well as the environment temperature have a significant influence on the creep performance. |