Font Size: a A A

Statistical learning and predictive modeling in data mining

Posted on:2007-11-30Degree:Ph.DType:Thesis
University:The Ohio State UniversityCandidate:Li, BinFull Text:PDF
GTID:2448390005966286Subject:Statistics
Abstract/Summary:
This research effort focuses on Bayesian robustness properties of regularized optimization methods and developing a hybrid predictive modeling strategy that emphasizes model interpretation.; It is known that many regularized optimization methods have Bayesian interpretation. In the first part of the thesis, we consider a class of flat-tailed priors for a general likelihood function in the same spirit as the ' t-distribution suggested as a flat-tail prior for normal likelihood'. We formalize the robustness property in terms of the relative tail behaviors of the likelihood and the priors. Using this setup, we examine the robustness properties for bridge regression family and group LASSO, as well as the consistency issue for the LASSO solution.; In the second part, we suggest a two-phase boosting method, called "additive regression tree and smoothing splines" (ARTSS), which is highly competitive in predictive performance. However, unlike many automated learning procedures, which lack interpretability and operate as a "black box", ARTSS allows us to (1) estimate the marginal effect smoothly; (2) test the significance of non-additive effects; (3) provide a measure of relative variable importance on main effects and interactions; (4) select variables and/or incorporating hierarchical structure in modeling. Finally, we apply ARTSS to two large public domain data sets and discuss the understanding developed from the model.
Keywords/Search Tags:Modeling, Predictive, ARTSS
Related items