Font Size: a A A

Inverse problems in high dimensional stochastic systems under uncertainty

Posted on:2011-09-21Degree:Ph.DType:Dissertation
University:University of MichiganCandidate:Harrington, Patrick Lloyd, JrFull Text:PDF
GTID:1448390002467432Subject:Statistics
Abstract/Summary:
Increasingly often, problems in modern medicine, quantitative finance, or social-networking involve tens of thousands of variables that interact with each other and jointly evolve over time. The states of these variables may correspond to the phenotype of a particular individual, the price of a security, or the current status of an individual's social networking profile. If these states are hidden to a researcher, additional information must be obtained to infer these hidden states based upon measurements of other variables, knowledge of the interacting network structure, and any dynamics that model the evolution of these states. This dissertation is an attempt to address general problems regarding reasoning under uncertainty in such spatio-temporal models but with an emphasis to applications in predictive health and disease in a loosely monitored population of individuals. The motivation is highly interdisciplinary and draws on tools and concepts from machine learning, statistics, epidemiology, bioinformatics, and physics.;We begin by presenting a solution to recursively sampling the best subset of nodes/variables that elicit the largest expected information gain of all sampled and un-sampled nodes in a large spatio-temporal complex network. We use methods from information theory and approximate Bayesian filtering to achieve this task. We then present a tractable method for empirically estimating the spatio-temporal graphical model structure corresponding to the "susceptible", "infected", and "recovered" (SIR) model of mathematical epidemiology. Here, we formulate the problem as an ℓ1-penalized likelihood convex program and produce network detection performance superior to other comparable state of the art methods. We present a logistic regression classifier that is robust to worst-case bounded measurement uncertainty. The proposed method produces superior worst-case detection performance to the standard ℓ 1-logistic regression classifier on a Human rhinovirus (HRV) gene expression data set. The relationship between sparsity promoting regularization penalties and robustness to bounded measurement uncertainty is also established. The final chapter concludes with identifying the appropriate basis functions used in a classification model when the data is both high-dimensional and temporally sampled with ultimate goal of discriminating between multiple states/labels, e.g., phenotypes. We utilize Gaussian Processes and ℓ1-logistic regression to accomplish this task and apply it to a human gene expression time-series data set resulting from a challenge study inoculation with Human Influenza A/H3N2, HRV, and Human respiratory syncytial virus (RSV).
Keywords/Search Tags:Uncertainty, Human
Related items