Clustering-based Stochastic Gradient Markov Chain Monte Carlo

Posted on:2019-03-01

Degree:Master

Type:Thesis

Country:China

Candidate:T F Fu

Full Text:PDF

GTID:2370330590967363

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Bayesian Learning is famous for its ability to encode knowledge(in terms of prior)and data information(in terms of likelihood).Markov Chain Monte Carlo(MCMC)is the mainstream Bayesian sampling method.However,in today's big data era,Markov Chain Monte Carlo scales poorly for large-scale dataset,because each iteration of Markov Chain Monte Carlo requires a traversal of the whole data set.To address this issue,in recent years,stochastic gradient Markov Chain Monte Carlo(SG-MCMC)methods have been raised to process large-scale dataset by iterative learning from small minibatches.However,the high variance caused by naive subsampling usually slows down the convergence to the desired posterior distribution.To handle this problem,In this paper,we devised an effective subsampling strategy to reduce the variance based on a failed attempt to do importance sampling.In particular,before sampling,we partition the dataset with k-means clustering algorithm in a preprocessing step and use the fixed clustering throughout the entire MCMC simulation.Then during simulation,we approximate the gradient of log-posterior via summing the estimated gradient of each cluster.In this subsampling strategy,we obtain fixed number data points from each cluster(which is obtained via clustering algorithm)to obtain an estimation to the gradient of the cluster.Then via adding certain weight to the gradient estimation to all the cluster,we have an estimation to the gradient of the whole dataset.We found that this clustering-based subsampling strategy can estimated a gradient that have smaller variance compared with the gradient estimated by the naive subsampling strategy.so it can accelerate the convergence to the desired posterior distribution.Worth to mention that the resulting procedure is surprisingly simple without enhancing the complexity of the original algorithm during the sampling procedure,which makes our method more applicable to many real-world applications.For experiment,we test our Clustering-based scheme on three sampling tasks,including stochastic gradient Langevin dynamics,stochastic gradient Hamilton Monte Carlo and stochastic gradient Riemann Langevin dynamics.Thorough empirical studies have been conducted to validate the effectiveness and efficiency of our proposed Clustering-based stochastic gradient Markov Chain Monte Carlo approach.

Keywords/Search Tags:

Bayesian Learning, Markov Chain Monte Carlo(MCMC), stochastic gradient Markov Chain Monte Carlo(SG-MCMC)

PDF Full Text Request

Related items

1	Improvement And Application Of MCMC Method For Paramter Estimation Of SV Models
2	Markov Chain Monte Carlo Method And Implemented With R
3	Convergence of adaptive Markov Chain Monte Carlo algorithms
4	The Application Of Bayesian Method In Actuarial
5	Optimization Algorithms In Stochastic Model Predictive Control Based On Monte Carlo Methods
6	Fraction Of O-u Process Of Bayesian Analysis And Its Applications
7	The Research On Bayesian Reliability Evaluation Based On Maximum Entropy And Markov Chain Monte Carlo Method
8	Markov Chain Monte Carlo Algorithms And Its Application In MIMO Detection
9	Bayesian Tests For Aberrant Behavior Based On The Box-Cox Response Time Modeling
10	Pre-stack Seismic Inversion Based On Markov Chain Monte Carlo Method