Font Size: a A A

Study On Dependencies Of Overdisperse Count Data Based On Several Markov Processes

Posted on:2023-07-04Degree:MasterType:Thesis
Country:ChinaCandidate:M X LiFull Text:PDF
GTID:2530306617467884Subject:Statistics
Abstract/Summary:PDF Full Text Request
Overdispersed count data is a type of non-negative discrete integer values with their variance greater than the mean,describing the frequency of a behavior in a certain period and widely used in various applications and scientific fields.The most popular model is the negative binomial distribution for the study on overdispersion of count data.However,multivariate count data has become increasingly prevalent in real world.And also there are multiple dependencies in such data,such as temporal dependence,spatial interaction and variable correlation,which would cause the common statistical models no longer satisfied with the relevant researches.If we take no account of overdispersion or dependent structures in time,space and variables of count data,the drop in accuracy of the model and even the wrong decision would be made.Therefore,considering modelling from the perspective of temporal dependence,spatial interaction and variable correlation that the overdispersed count data has,it is expected to improve the accuracy of parameter inferences and classification greatly via establishing the exclusive models for overdispersed count data.Markov process is a type of classical stochastic processes used to describe the dependency of variables in time.Two kinds of classic application models in Markov process are Hidden Markov Model and Coupled Hidden Markov Model,which can model the time series and the data with spatio-temporal structure respectively.To analyze temporal dependence,spatial interaction and variable correlation of overdispersed count data,this thesis has three parts:statistical modeling,parameter estimations and practical applications.(1)In terms of statistical modeling,this thesis selects several classical Markov processes,combining the negative binomial distribution with the multivariate Poisson log-normal distribution,to fit the three dependencies of count data in time,space and variables.Exactly,the time structure can be handled by the Markov chain,the spatial structure can be described by the coupling relationship across the Markov chains,and the relationship between variables can be considered as the conditional emission distribution.(2)In terms of parameter estimations,this thesis adopts the advanced approaches,such as Expectation Maximization algorithm,fictitious likelihood function,Variational Inference and Bayesian Sampling to deal with the statistical and computational problems caused by the complexity of the model.According to the different structural characteristics of the models,two algorithms are proposed to handle the parameter estimations.In fact,the Variational Fictitious Likelihood Expectation Maximization algorithm is used for the Coupled Hidden Markov Model based on the Negative Binomial distribution,as well as the Monte Carlo Expectation Maximization algorithm is applied to the Hidden Markov Model based on the Multivariate Poisson Log-normal distribution.(3)In terms of practical applications,theoretical investigations will be accompanied by applications of our new models on field associated with air pollution.It is vital to have a better understand of various interdependent structures hidden on air pollution data for accurate classification and prediction of air quality state,which can provide theoretical support for joint treatment of air pollution,as well as to help people make their travel arrangements reasonably reducing air pollution exposure.
Keywords/Search Tags:Overdispersed count data, Hidden Markov Model, Negative Binomial dis-tribution, Multivariate Poisson Log-normal distribution, Expectation Maximization algo-rithm, Variational Inference, Markov Chain Monte Carlo algorithm, Fictitious Likelihood Function
PDF Full Text Request
Related items