Font Size: a A A

Bolstering CART and Bayesian variable selection methods for classification

Posted on:2003-09-02Degree:Ph.DType:Thesis
University:Texas A&M UniversityCandidate:Sha, NaijunFull Text:PDF
GTID:2468390011978303Subject:Statistics
Abstract/Summary:
An important problem in many areas is exploring the relationship between object categories and their observational characteristics. In particular, it is important to understand which measurements are related to a specific category. One way of tackling this sort of discriminant problem is by a nonparametric method known as Classification and Regression Trees (CART). In this thesis, a stochastic step is added to the CART algorithm and an annealing schedule is used to find 'optimal' models. Two approaches to model selection are proposed to avoid overfitting problems.; For the problems with high dimensional and collinear data sets, we propose a Bayesian variable selection approach to multinomial probit models. Motivated by the binary probit model with latent variables, we build a multivariate extension to the case of more than two categories and use latent variables to specialize the general distributional setting to the linear model with Gaussian errors. We then apply Bayesian variable selection techniques that adopt natural conjugate prior distributions. A posteriori we integrate some of the parameters out and do inference on the marginal distribution of single models by using MCMC methods and truncated normal or student-t sampling techniques to draw multivariate vectors. We apply the methodology to problems in both chemometrics and functional genomics, first to a dataset with three wheats and 100 near infra-red absorbance as regressors, then to two datasets involving microarray data.
Keywords/Search Tags:Bayesian variable selection, CART
Related items