On population based Markov chain Monte Carlo methods

Posted on:2006-11-26

Degree:Ph.D

Type:Dissertation

University:Harvard University

Candidate:Goswami, Gopika R

Full Text:PDF

GTID:1458390008452118

Subject:Statistics

Abstract/Summary:

Real-parameter Evolutionary Monte Carlo (EMC) algorithm, a "population based" method, has been proposed as an effective tool not only for sampling from multi-modal distributions but also for stochastic optimization. The authors (Liang and Wong, 2001) have shown by various well-chosen examples that it works better than parallel tempering. It has a certain ability to learn from the past and it improves mixing by sampling along a temperature ladder. We introduce four new "exchange moves" which equip the algorithm with more learning capability while improving the quality of the samples, as measured by Effective Sample Size (ESS), produced from the target distribution. This new algorithm is called the Target Oriented EMC (TOEMC) algorithm. Secondly, we present a new easy to implement strategy for determining the temperature range and construction of the temperature ladder. Apart from heuristics, a concrete solution to this problem was non-existent in the literature until now. We apply TOEMC and temperature ladder construction technique to various multi-modal distributions, namely, mixture of multivariate (2 and 50 dimensional) Normals and Bayesian Neural Networks and find a huge performance improvement with respect to ESS. Thirdly, we generalize a result originally proved in (Gilks, Roberts, George, 1994) about the validity of the conditional (or line) sampling step in the context of the snooker algorithm which is a type of move involved in the EMC. Fourthly, in a similar set up of population based sampling we address the problem of clustering. It may arise in the context of clustering data points according to some objective function (e.g. K-means clustering) or we may have a posterior (e.g. posterior from a Dirichlet Process prior) density of cluster indicators.; We cast both kinds of problems in the framework of sampling for cluster indicators. So far, Gibbs sampling, "split-merge" Metropolis-Hasting algorithm and various modifications of these (see Jain, Neal, 2004, Dahl, 2003 and Jensen, Liu, 2005) have been the basic tools used for sampling in this context. We introduce two new "crossover moves" (based on swapping sub-clusters) which make such an algorithm very efficient with respect to Integrated Autocorrelation Time (IAT) of various relevant statistics and also with respect to the ability to escape from local modes. One of these new moves has the very desirable property of leaving a "good parent" unchanged in a crossover while producing only one "new child". Designing a probabilistic crossover move with such a property has been a long standing problem to which we present a first-cut solution. We call this new algorithm Population Based Clustering (PBC) algorithm. We apply PBC algorithm to motif clustering, Beta mixture Bernoulli clustering where we find a great deal of improvement with respect to IAT as mentioned above. We also look at clustering of mixture of Normals and compare the performance PBC algorithm as a stochastic optimizer with K-means clustering and find that PBC can very easily escape from local modes.

Keywords/Search Tags:

Algorithm, Population, Clustering, PBC, EMC

Related items

1	On population based Markov chain Monte Carlo methods
2	Population Topology Structure-based Differential Evolution Algorithm And Its Applications In Clustering
3	Initializing the EM Algorithm for Data Clustering and Sub-population Detection
4	Artificial Plants With A Dynamic Population Algorithm
5	The Improved Hybrid Genetic Algorithm Based On Clustering
6	Clustering Analysis Methods Based On Swarm Intelligent Algorithm
7	Modeling And Forecasting The Age Structure Of Population
8	Application Of Parallel Genetic Algorithm For The Dynamic Job Shop Scheduling Problem
9	A Multiple Population Genetic Algorithm Based On Population Cluster
10	The Research And Application Of Fuzzy Clustering Based On Improved Genetic Algorithm