Font Size: a A A

Automatic Model Selection on Local Gaussian Structures with Priors: Comparative Investigations and Applications

Posted on:2013-05-18Degree:Ph.DType:Thesis
University:The Chinese University of Hong Kong (Hong Kong)Candidate:Shi, LeiFull Text:PDF
GTID:2458390008965561Subject:Statistics
Abstract/Summary:
Model selection aims to determine an appropriate model scale given a small size of samples, which is an important topic in machine learning. As one type of efficient solution, an automatic model selection starts from a large enough model scale, and has an intrinsic mechanism to push redundant structures to be ineffective and thus discarded automatically during learning. Priors are usually imposed on parameters to facilitate an automatic model selection. There still lack systematic comparisons on automatic model selection approaches with priors, and this thesis is motivated for such a study based on models with local Gaussian structures.;Particularly, we compare the relative strength and weakness of three typical automatic model selection approaches, namely Variational Bayesian (VB), Minimum Message Length (MML) and Bayesian Ying-Yang (BYY) harmony learning, on models with local Gaussian structures. First, we consider Gaussian Mixture Model (GMM), for which the number of Gaussian components is to be determined. Further assuming each Gaussian component has a subspace structure, we extend to consider two models namely Mixture of Factor Analyzers (MFA) and Local Factor Analysis (LFA), for both of which the component number and local subspace dimensionalities are to be determined.;Two types of priors are imposed on parameters, namely a conjugate form prior and a Jeffreys prior. The conjugate form prior is chosen as a Dirichlet-Normal-Wishart (DNW) prior for GMM, and as a Dirichlet-Normal-Gamma (DNG) prior for both MFA and LFA. The Jeffreys prior and the MML approach are not considered on MFA/LFA due to the difficulty in deriving the corresponding Fisher information matrix. Via extensive simulations and applications, comparisons on the automatic model selection algorithms (six for GMM and four for MFA/LFA), we get following main findings:;1. Considering priors on all parameters makes each approach perform better than considering priors merely on the mixing weights.;2. For all the three approaches on GMM, the performance with the DNW prior is better than with the Jeffreys prior. Moreover, Jeffreys prior makes MML slightly better than VB, while the DNW prior makes VB better than MML.;3. As the DNW prior hyper-parameters on GMM are changed from fixed to freely optimized by each of its own learning principle, BYY improves its performance, while VB and MML deteriorate their performances. This observation remains the same when we compare BYY and VB on either MFA or LFA with the DNG prior. Actually, VB and MML lack a good guide for optimizing prior hyper-parameters.;4. For both GMM and MFA/LFA, BYY considerably outperforms both VB and MML, for any type of priors and whether hyper-parameters are optimized. Being different from VB and MML that rely on appropriate priors, BYY does not highly depend on the type of priors. It performs already well without priors and improves by imposing a Jeffreys or a conjugate form prior.;5. Despite the equivalence in maximum likelihood parameter learning, MFA and LFA affect the performances by VB and BYY in automatic model selection. Particularly, both BYY and VB perform better on LFA than on MFA, and the superiority of LFA is reliable and robust.;In addition to adopting the existing algorithms either directly or with some modifications, this thesis develops five new algorithms to fill the missing gap. Particularly on GMM, the VB algorithm with Jeffreys prior and the BYY algorithm with DNW prior are developed, in the latter of which a multivariate Student’s T-distribution is obtained as the posterior via marginalization. On MFA and LFA, BYY algorithms with DNG priors are developed, where products of multiple Student’s T-distributions are obtained in posteriors via approximated marginalization. Moreover, a VB algorithm on LFA is developed as an alternative choice to the existing VB algorithm on MFA.
Keywords/Search Tags:Model selection, Prior, Local gaussian structures, LFA, MFA, VB algorithm, BYY, MML
Related items