Font Size: a A A

On maximum likelihood estimation for contingency tables

Posted on:2006-07-09Degree:Ph.DType:Thesis
University:Carnegie Mellon UniversityCandidate:Rinaldo, AlessandroFull Text:PDF
GTID:2450390008452116Subject:Statistics
Abstract/Summary:
Log-linear models are a powerful statistical tool for the analysis of categorical data. Their use has increased greatly over the past decades with the compilation and distribution of large sparse databases, in the social and medical sciences as well as in machine learning applications. Log-linear modeling analysis depends crucially upon the Maximum Likelihood Estimate (MLE) of the expected value of the vector of observed counts, which is required for assessment of fit, model selection and interpretation. This thesis provides a generalization of the conditions for the existence of the MLE available in the statistical literature, proposes a constructive characterization of the cases in which the MLE is not defined and devises computational methods for extended maximum likelihood estimation.; Novel geometric and combinatorial conditions for the existence of the MLE are derived by exploiting the connections between algebraic and polyhedral geometry and the theory of exponential families. It is shown that log-linear models can be associated with extended linear exponential families of distributions parametrized, in a mean value sense, by non-negative points lying on toric varieties. Within the framework of extended exponential families, the MLE, which is called more appropriately extended MLE, always exists and is unique. Various properties of the extended MLE are then derived and discussed.; Efficient procedures to compute the extended MLE and to adjust the dimension of the log-linear model for possible the non-estimability of some parameters are proposed. These methodologies take advantage of the geometric nature of theoretical results about the existence of the MLE and are capable of identifying those cells whose mean values cannot be determined by maximizing the likelihood because of insufficient information in the data. The computational methods put forward in the thesis are based on existing algorithms, which are improved upon by allowing for the possibility of performing extended maximum likelihood estimation and by introducing efficient ways of generating and manipulating design matrices.; The results obtained in the thesis are of both theoretical and practical relevance for the log-linear model analysis of categorical data and, in particular, for conducting model selection in the presence of large, sparse contingency tables.
Keywords/Search Tags:Maximum likelihood estimation, MLE, Model, Data, Log-linear
Related items