Font Size: a A A

Bayesian inference with mixtures of logistic regression: Functional approximation, statistical consistency and algorithmic convergence

Posted on:2009-12-14Degree:Ph.DType:Thesis
University:Northwestern UniversityCandidate:Ge, YangFull Text:PDF
GTID:2448390002494316Subject:Statistics
Abstract/Summary:
One of the most commonly used techniques for classification problem is logistic regression. For example, logistic regression for a binary response assumes that the odds Pr(y = 1|x)/Pr( y = 0|x) = e(alpha+betax) . However, in reality, the pattern of the data can be so complicated that logistic regression model often fails, as such a model is not very flexible. The 'mixtures of experts' (ME) approach enriches the model by mixing m (m ≥ 2) 'experts', with each expert being one such logistic regression.; This thesis first studies the approximation rate of the proposed ME models where binary/multinomial logistic regression models are mixed. We show that the combined models can be used to approximate a class of 'smooth' probability models for binary or multiclass responses. With bounded second derivatives of log-odds, the approximation rate is O(m -2/s) in Hellinger distance or O(m-4/s) in Kullback-Leibler divergence. Here s = dim(x) is the dimension of the input space (or the number of predictors).; Bayesian inference conditional on the observed data can then be used for regression and classification. In the thesis, we first consider the binary classification problem. Conditions are established on choosing the number of experts (i.e., number of mixing components) k, or choosing a prior distribution for K, so that Bayesian inference is 'consistent', in the sense of 'often approximating' the underlying true relationship between y and x. The resulting classification rule is also 'consistent', in the sense of having near-optimal performance in classification. We show these desirable consistency properties with a nonstochastic k growing slowly with the sample size n of the observed data, or with a random K that takes large values with nonzero but small probabilities. Then for the multiclass problem, we also show that 'consistency' in regression and classification can be achieved, simultaneously for all classes, when posterior based inference is performed in a Bayesian framework.; The Metropolis-Hastings algorithm is one of the important Markov Chain Monte Carlo (MCMC) methods available for sampling from a posterior distribution. The thesis outlines the methods and strategies to construct the simple Metropolis-Hastings chain for ME with nonstochastic number of experts, and the hybrid Metropolis-Hastings chain for ME with random number of experts as well. In addition, recent results in Markov chain theory are applied to both simple and hybrid algorithms and conditions are established for the algorithms to converge.
Keywords/Search Tags:Logistic regression, Bayesian inference, Classification, Approximation, Chain
Related items