Font Size: a A A

Algebraic Complexity in Statistics using Combinatorial and Tensor Methods

Posted on:2014-05-14Degree:Ph.DType:Dissertation
University:University of Illinois at ChicagoCandidate:Gross, ElizabethFull Text:PDF
GTID:1458390008956149Subject:Mathematics
Abstract/Summary:
Fundamental questions in statistical modeling ask about the best methods for model selection, goodness-of-fit testing, and estimation of parameters. For example, given a collection of aligned DNA sequences from a group of extant species, how can we decide which evolutionary tree best describes the species' ancestral history, or, given a sparse high-dimensional contingency table, how can we perform goodness-of-fit testing when exact tests are infeasible? In questions such as these, combinatorics, commutative algebra and algebraic geometry play a leading role. We explore such questions for specific classes of models, e.g. toric models, phylogenetic models, and variance components models, and tackle the algebraic complexity problems that lie at the root of them. We begin our exploration by studying toric ideals of hypergraphs, algebraic objects that are used for goodness-of-fit testing for log-linear models. In this study, we use the combinatorics of hypergraphs to give degree bounds on the generators of the ideals, give sufficiency conditions of when a binomial in the ideal is indispensable, show that the ideal of Tan( P P1)n is generated by quadratics and cubics in cumulant coordinates, and recover a well-known complexity theorem in algebraic statistics due to De Loera and Onn. Second, we explore phylogenetic models by viewing the models as sets of tensors with bounded rank. We show that the variety of 4 x 4 x 4 complex-valued tensors with border rank at most 4 is ddefined by polynomials of degree 5, 6, and 9. This variety corresponds to the 4-state general Markov model on the claw tree K1,3 and its defining polynomials can be used in model selection. This result also gives further evidence that the phylogenetic ideal of the model can be generated by polynomials of degree 9 and less. Finally, we look at the algebraic complexity of maximum likelihood estimation for variance components models, where we give explicit formulas for the ML and REML degree of the random effects model for the one-way layout and give examples of multimodal likelihood surfaces.
Keywords/Search Tags:Algebraic complexity, Model, Goodness-of-fit testing, Give, Degree
Related items