Font Size: a A A

Efficient Bayesian Inference Algorithms In Deep Learning

Posted on:2022-02-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:L J WenFull Text:PDF
GTID:1488306524470854Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Deep learning,as an important branch of machine learning,is an algorithm that uses hierarchical nonlinear transformation structures to to perform pattern matching and pre-diction.In recent years,deep learning has been widely applied in the fields of image recognition,natural language processing and speech recognition,greatly improving the performance of algorithms and playing an increasingly important role in artificial intel-ligence.However,the defects of deep learning in practical applications are also quite obvious.The training of deep learning often requires a large amount of labeled data.The computational complexity of deep models is high.The algorithms of deep learning are easy to be attacked.Deep learning is lack of interpretability,and its internal work-ing mechanism is not clear.Bayesian inference method can capture the process of data generation based on a large number of Bayesian statistical theories.The combination of Bayesian inference algorithm and deep learning can provide statistical interpretation for the model,improve the robustness and alleviate the above problems.This dissertation studies how to effectively combine Bayesian inference method and deep learning to remedy the defects of deep learning algorithm.Specifically,in this pa-per,estimation of mutual information in deep representation learning,the compression of recurrent neural network model,and the posterior collapse of variational autoencoders are studied in depth.The main research contents and contribution are as follows,Mutual Information(MI)plays an important role in unsupervised representation learn-ing.However,MI is unfortunately intractable in continuous and high-dimensional set-tings.Recent advances establish tractable and scalable MI estimators to discover useful representation.However,most of the existing methods are not capable of providing an ac-curate estimation of MI with low-variance.We argue that directly estimating the gradients of MI is more appealing for representation learning than estimating MI in itself.As another viewpoint for the optimization of entropy and mutual information,we do not care about the quantity of entropy and mutual information themselves,but rather how to maximize or minimize this quantity as a part of the optimization objective.To this end,we propose the Mutual Information Gradient Estimator(MIGE)for representation learning based on the score estimation of implicit distributions.MIGE exhibits higher accuracy and lower variance for the gradient estimation of MI in the high-dimensional and large-MI settings.We expand the applications of MIGE in both unsupervised learning of deep representa-tions based on Info Max and the Information Bottleneck method.Extensive experiments on various deep learning tasks have demonstrated the superiority of our method.Recurrent neural networks(RNNs)have recently achieved remarkable successes in a number of applications.However,the huge sizes and computational burden of these models make it difficult for their deployment on edge devices.A practically effective ap-proach is to reduce the overall storage and computation costs of RNNs by network pruning techniques.Despite their successful applications,those pruning methods based on Lasso either produce irregular sparse patterns in weight matrices,which is not helpful in practical speedup.To address these issues,we propose a structured sparse inferencem ethod through neuron selection which can remove the independent neuron of RNNs.More specifically,we introduce two sets of binary random variables,which can be interpreted as gates or switches to the input neurons and the hidden neurons,respectively.We demonstrate that the corresponding optimization problem can be addressed by minimizing the L0norm of the weight matrix.This can be equivalent to apply spike-and-slab priors to model param-eters.Finally,experimental results on language modeling and machine reading compre-hension tasks have indicated the advantages of the proposed method in comparison with state-of-the-art pruning methods.Variational Autoencoders(VAE)is one kind of deep latent variable models,which is widely used in generative modeling,semi-supervised learning and representational learn-ing.Recent researches show that the prior plays an important role in the density estima-tion.Although the standard Gaussian prior is usually used,this simple prior incurs over-regularization,which is one of the causes of the poor density estimation performance.This over-regularization is also known as the posterior-collapse phenomenon.To alleviate the above problems,we propose the discrete aggregated prior,which can learn potential structural features from data and better regularize a variational posteriori.Specifically,we find the core points of all data corresponding to the hidden variables in the hidden variable space and aggregate them into the priors of the hidden variables using Parzen Windows density estimation.Compared with the standard Gaussian prior and Gaussian mixed prior,the discrete aggregation prior has better generation effect.
Keywords/Search Tags:Bayesian method, Deep Learning, VAE, Latent Variable Models, Mutual Information
PDF Full Text Request
Related items