Font Size: a A A

Research On Learning Algorithms For Restricted Boltzmann Machines

Posted on:2017-08-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:X S MaFull Text:PDF
GTID:1318330518995985Subject:Intelligent Science and Technology
Abstract/Summary:PDF Full Text Request
Restricted Boltzmann Machines (RBMs) are a kind of generative models that can be interpreted as stochastic neural networks. Recently,they have attracted much attention as core building blocks for deep neural networks including Deep Blief Networks and Deep Boltzmann Machines.Deep neural networks have become an important kind of deep learning structures that have been used for handwritten number recognition and image recognition tasks.However, Contrastive Divergence (CD) that is a popular learning algorithm for RBMs is an approximation algorithm. The convergence, the learning speed and the learning performance of CD has attracted more and more attention. However there are still many problems needed to be further researched. This thesis studies these problems, obtains new convergence conditions of the algorithm and proposes an improved learning algorithm for RBMs. Specifically, the mian contributions of this thesis are as follows:The thesis derives the relationship between the CD algorithm and the ture log-likelihood gradient, gives the result that CD is the biased estimators of the log-likelihood gradient. Meanwhile the thesis studies the Persistent Contrastive Divergence (PCD) algorithm and obtains the relationship between PCD and the log-likelihood gradient. The PCD algorithm has the similar results with the CD algorithm. The thesis analyzes the approximation bias and error of both algorithms. The approximation bias is a term that tends to zero, the approximation error contains not only the approximation bias but also a term with zero expectation. The results generalize the conclusions of Bengio and Delalleau (2009).The thesis proposes a gradient learning algorithm with mixed errors,which provides an effective framework for studying the convergence of the CD algorithm from the point of view of the gradient learning. The mixed errors is the mixture of deterministic error and stochastic error. The thesis studies the convergence of the gradient algorithm with mixed errors in case where the gradient of the objective function is Lipschitz continuous. The convergence conditions of the gradient method with mixed errors are given. The learning rate, the errors and etc are constrainted by these convergence conditons.Based on the convergence conditions of the gradient learning algrithm with mixed errors, the convergence of the CD algorithm is studied and the convergence conditions are given. More concise convergence conditions are obtained by specifying the learning rate. The thesis also considers the convergence of CD algorithm for RBMs in which both hidden variables and visible variables can only take a finite number of values. The relationship that the Gibbs sampling steps and the learning rate should satisfy to guarantee the algorithm convergence. Some experimental results are given in order to interpret this relationship.The thesis proposes a new learning algorithm called Average Contrastive Divergence for RBMs, the algorithm directly approximates the Expectation of CD (ECD). ECD and CD are two biased estimators of the log-likelihood gradient. The approximation error of ECD is less than CD's. The approximation error of CD includes not only the approximation error of ECD but also a stochastic error term. The thesis proves that the approximation error of the ACD algorithm is less than or equal to the traditional CD algorithm with probability 1 in ordinary circumstances. That is to say, the ACD algorithm is more close to the ture log-likelihood gradient than the traditional CD algorithm. At the same time, ACD is analysed on simulated and real data task. RBMs with one visible node and one hidden node and RBMs with 12 visible nodes and 10 hidden nodes are considered on the simulated data task. The MNIST task is considered for the real data. The experimental results are consistent with the theoretical analysis. The ACD algorithm is a better approximation of the log-likelihood gradient than other algorithms. The experimental results show that ACD outperforms the CD and PCD algorithms and the convergence is more stable.
Keywords/Search Tags:Restricted Boltzmann Machines, Contrastive Divergence, Average, convergence log-likelihood gradient
PDF Full Text Request
Related items