Font Size: a A A

A Statistical Algorithm For Fitting Overdispersed Count Data

Posted on:2022-06-24Degree:MasterType:Thesis
Country:ChinaCandidate:T Y YaoFull Text:PDF
GTID:2480306311950759Subject:Statistics
Abstract/Summary:PDF Full Text Request
Count data is a type of discrete data whose values are non-negative integers,which are widely used in many fields.Overdispersion is the property that the variance of random variables is greater than the mean.Over-dispersed count data is observed in various research fields such as biological information,air quality,insurance,and earthquakes.For statistical model analysis of such kind of data,it is common practice to treat the data with a Poisson distribution model or to transform the data,for example,to a Gaussian distribution before inferring.But these methods may lead to larger parameter estimation deviations and even wrong conclusions.Currently,the negative binomial distribution is a mainstream distribution model for fitting over-dispersed count data,and it has achieved good results in some application scenarios.However,when estimating the parameters of models based on negative binomial distribution by maximum likelihood esti-mation,an additional looping step of the numerical solution iteration algorithm must be introduced since there is no closed-form solution in the M-step of the Expectation Maximization algorithm(EM algorithm),which leads to the ineffi-ciency of the algorithm.In this paper,a new class of algorithms is explored to avoid the loop nesting phenomenon in EM-like algorithms by constructing fictions likelihood functions,and thus to solve the inefficiency problem.Based on the negative binomial model,three unsupervised classification mod-els are examined,namely,the mixture of Negative Binomial distribution models,Negative Binomial Hidden Markov models,and Negative Binomial Coupled Hid-den Markov models.The EM algorithms with fictious likelihood functions(EM-FL)applied to these models are validated by both numerical simulations and real dataset for parameter point estimation.The theorem proposed in this paper the-oretically proves that the improved EM-FL algorithm is convergent.In addition,numerical simulations also verify the convergence of the algorithm.Meanwhile,the numerical simulation also investigated the accuracy of the algorithm and the influence of parameter values to convergence.This article also shows the practical application of the algorithm in processing traffic accident data,physical activity data,and air quality data.The EM-FL algorithm avoids the nesting loop problem,could be used as an al-ternative method of the existing generalized expectation maximization algorithm(GEM algorithm)which uses numerical solutions in M steps,and the ideas can be extended to more models.It also has potential applications in various research fields.
Keywords/Search Tags:Expectation Maximization Algorithm, Negative Binomial Distribution, Mixture Model, Hidden Markov Model, Coupled Hidden Markov Model
PDF Full Text Request
Related items