Investigation on Bayesian Ying-Yang learning for model selection in unsupervised learning

Posted on:2007-06-05

Degree:Ph.D

Type:Thesis

University:The Chinese University of Hong Kong (Hong Kong)

Candidate:Hu, Xuelei

Full Text:PDF

GTID:2448390005978548

Subject:Computer Science

Abstract/Summary:

Model selection is a critical issue in unsupervised learning. Conventionally, model selection is implemented in two phases by some statistical model selection criterion such as Akaike's information criterion (AIC), Bozdogan's consistent Akaike's information criterion (CAIC), Schwarz's Bayesian inference criterion (BIC) which formally coincides with the minimum description length (MDL) criterion, and the cross-validation (CV) criterion. These methods are very time intensive and may become problematic when sample size is small. Recently, the Bayesian Ying-Yang (BYY) harmony learning has been developed as a unified framework with new mechanisms for model selection and regularization. In this thesis we make a systematic investigation on BYY learning as well as several typical model selection criteria for model selection on factor analysis models, Gaussian mixture models, and factor analysis mixture models.; For factor analysis models, we develop an improved BYY harmony data smoothing learning criterion BYY-HDS in help of considering the dependence between the factors and observations. We make empirical comparisons of the BYY harmony empirical learning criterion BYY-HEC, BYY-HDS, the BYY automatic model selection method BYY-AUTO, AIC, CAIC, BIC, and CV for selecting the number of factors not only on simulated data sets of different sample sizes, noise variances, data dimensions and factor numbers, but also on two real data sets from air pollution data and sport track records, respectively.; The most remarkable findings of our study is that BYY-HDS is superior to its counterparts, especially when the sample size is small. AIC, BYY-HEC, BYY-AUTO and CV have a risk of overestimating, while BIC and CAIC have a risk of underestimating in most cases. BYY-AUTO is superior to other methods in a computational cost point of view. The cross-validation method requires the highest computing cost. (Abstract shortened by UMI.)...

Keywords/Search Tags:

Model selection, BYY, Bayesian, Criterion

Related items

1	Investigation on Bayesian Ying-Yang learning for model selection in unsupervised learning
2	Based On The Improved Standards Of Bayesian Network Model
3	Bayesian Network Classifiers And Application
4	Research On Bayesian Learning Theory And Its Application
5	Research And Applications Of Clustering Algorithms With The Model Selection Ability
6	Bayesian Feature Selection For Text Classification
7	Research On Multi-Dimension Bayesian Network Classifiers Based On Feature Selection
8	A Method Of Feature Selection Based On Extended Bayesian Information Criteria In Software Defect Prediction
9	A Bayesian network model of knowledge-based authentication
10	Structure Learning In Bayesian Networks And Construct MBNC Experimental Platform