Font Size: a A A

Investigation on Bayesian Ying-Yang learning for model selection in unsupervised learning

Posted on:2007-06-05Degree:Ph.DType:Thesis
University:The Chinese University of Hong Kong (Hong Kong)Candidate:Hu, XueleiFull Text:PDF
GTID:2448390005978548Subject:Computer Science
Abstract/Summary:
Model selection is a critical issue in unsupervised learning. Conventionally, model selection is implemented in two phases by some statistical model selection criterion such as Akaike's information criterion (AIC), Bozdogan's consistent Akaike's information criterion (CAIC), Schwarz's Bayesian inference criterion (BIC) which formally coincides with the minimum description length (MDL) criterion, and the cross-validation (CV) criterion. These methods are very time intensive and may become problematic when sample size is small. Recently, the Bayesian Ying-Yang (BYY) harmony learning has been developed as a unified framework with new mechanisms for model selection and regularization. In this thesis we make a systematic investigation on BYY learning as well as several typical model selection criteria for model selection on factor analysis models, Gaussian mixture models, and factor analysis mixture models.; For factor analysis models, we develop an improved BYY harmony data smoothing learning criterion BYY-HDS in help of considering the dependence between the factors and observations. We make empirical comparisons of the BYY harmony empirical learning criterion BYY-HEC, BYY-HDS, the BYY automatic model selection method BYY-AUTO, AIC, CAIC, BIC, and CV for selecting the number of factors not only on simulated data sets of different sample sizes, noise variances, data dimensions and factor numbers, but also on two real data sets from air pollution data and sport track records, respectively.; The most remarkable findings of our study is that BYY-HDS is superior to its counterparts, especially when the sample size is small. AIC, BYY-HEC, BYY-AUTO and CV have a risk of overestimating, while BIC and CAIC have a risk of underestimating in most cases. BYY-AUTO is superior to other methods in a computational cost point of view. The cross-validation method requires the highest computing cost. (Abstract shortened by UMI.)...
Keywords/Search Tags:Model selection, BYY, Bayesian, Criterion
Related items