Font Size: a A A

Parameter Choice For Boltzmann Machines:Theories And Applications

Posted on:2017-06-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:X C ZhaoFull Text:PDF
GTID:1318330515467069Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Deep learning models have drawn increasing attention due to their impressive empirical performance in various application areas.Despite of these practical successes,there have been debates on the fundamental principle of the design and training of those deep architectures.In practice,researchers need constant trial and error to set the network structure and control the training process.Since there is a lack of theoretical guidance,the design complexity has become the bottleneck for more widely applications.The focus of this paper is the general parameter choice(or reduction)in neural networks: utilize as few parameters to retain as much information of the probability distribution,in order to improve the model's computational efficiency and generalization capability.In this paper,we focus our study on Boltzmann machine(BM)for two main reasons:1)BM has been widely used as basic components in deep learning models;2)The information geometry(IG)theory provides a unified perspective and analytical tool for BM.Based on IG,we formalize the parameter choice problem as the optimization problem to maximally preserve the geometry structure of statistical manifolds.Specifically,the main innovations of this paper include:1.We propose a general parameter choice criterion for the family of multivariate binary distributions.Based on IG,we define the relative importance(called confidence)of a parameter as its contribution to the expected Fisher-Rao information distance within the geometric manifold over the neighbourhood of the underlying real distribution.We can naturally preserve the high confident parameters and setting the lowly confident parameters to neutral value.Therefore,this parameter choice criterion is called the confidentinformation-first principle(CIF).We prove that the CIF principle lead to a submanifold that can maximally preserve the expected Fisher-Rao information distance between any distribution and its ?-sphere neighborhood.2.We analyze different types of Boltzmann machines as implementations of CIF,which reveals the essential parts of the target density that the BM can capture in terms of model selection.3.We propose an efficient CIF-based model selection algorithm for BM when the sample is given.Based on CIF,we could use the confidence to decide the priority order of parameters.Furthermore,we could decide whether certain parameter should be preserved or not by using a hypothesis test on the confidence,which can effectively reduce the time complexity of parameter choice.4.We develop a CIF-based method to regularize the network structure of deep neural networks,so as to alleviate the overfitting problem while training.We propose to use the confidences of parameters to measure their importance,and the sub-network that consists of only confident connections is called ConfNet.We also modify the training algorithm for deep ConfNet,where the network is dynamically adjusted while training to balance the model complexity and sample size.
Keywords/Search Tags:Boltzmann Machine, Information Geometry, Parameter Reduction, Model Selection, Deep Neural Networks
PDF Full Text Request
Related items