Font Size: a A A

Statistical Inference For Contigency Tables With Missing Data And A Semiparametric Non-linear Dynamic Factor Model

Posted on:2016-04-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:X G LuoFull Text:PDF
GTID:1220330470956480Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of modern science and technology, there have been an increasing number of various categories of data. Statistical inference based on this kind of data is an important topic in biomedical research. Among them there is a special R×C with missing data in contingency tables. Particularly2×2table with the missing data have caused a lot of the concern of scholars. This paper studies the statistical inference problem for this kind of data. Nonlinear dynamic factor analysis models (NDFAMs) is a very extensive use of statistical models. It is very difficult to model NDFAMs only by using traditional methods, so we also do some work in modeling and Bayesian inference for NDFAMs. the main contents are as follows:1. We consider the two-sided equivalence hypothesis with asymmetric equivalence margins on the proportion difference (-δ0,δ1) for an incomplete matched-pair design under the assumption that missing data mechanism depends on treatment but not on outcome from a pointview of small samples. We investigate likelihood ratio statistic, score statistic, and two Wald-type statistics for testing equality of two correlated proportions in the presence of incomplete data under the above assumed non-random missing mechanism. Based on the proposed test statistics, we develop three procedures, including asymptotic、approximate unconditional methods and Bootstrap-resampling method, to calculate the p-value of test hypothesis. Our simulation studies show that the asymptotic test method produces inflated type I error rate in small sample or sparse structure problems, whilst the approximate un conditional method and Bootstrap-resampling method usually yields type I error rate close to the prespecified significance level. Compared with the asymptotic, Bootstrap-resampling methods, the approximate unconditional method based on the score test statistic generally possesses (i) the type I error rate closer to the significance level and (ii) higher power under controlling the type I error rate and (iii) computationally much simpler. Hence, we recommend the usage of the approximate unconditional method based on the score test statistic in practice.2. We also studies the equivalence problem in Chapter2from Bayesian inference. The idea is to convert it to a hierarchical model, and model by Stan language; with the help of Stan sampling machine, we can get the parameter estimates (including Bayesian P-values) and density function estimation. The most important feature of this method is to provide a versatile solution for statistical inference of contingency table with incomplete data. Modeling based on Stan’s language has very good flexibility. In other words, statistical inference for other contingency table can be solved by a few changes.3. We propose five CIs for sensitivity difference of two continuous-scale diagnostic tests at the fixed level of two specificities based on the generalized pivotal quantities, the hybrid approach incorporated with the ’Wilson score’ method and the ’Agresti-Coul’ method, and the Bootstrap resampling method. Simulation studies are conducted to compare the finite performance of the proposed five intervals in terms of the coverage probability, the expected interval width, and the left-and right-tail error rates. Our empirical results evidence that the hybrid method with the Agresti-Coull interval outperforms the existing methods from small to moderate sample sizes even when the underlying distribution is misspecified, and the generalized pivotal quantity method behaves satisfactorily when the underlying distribution is correctly specified.4. Software developed for Bayesian analysis of structural equation models can be used for nonlinear dynamic factor analysis models (NDFAMs) with the normality assumption of dynamic parameters. Bayesian inference for NDFAMs with the unknown distributional assumption of dynamic parameters is challenging. This paper presents a user friendly RStan code to implement a semiparametric Bayesian analysis of NDFAMs with the unknown distributional assumption of the dynamic parameters, which is specified by the truncated Dirichlet process prior. Bayesian approach to NDFAMs developed by stan provides a general framework to analyze more complicated hierarchical models. In summary, for the statistical inference of categorical data with missing data, we not only study frequency statistical methods, but also consider the Bayesian inference with truncated Dirichlet process prior; Although there are different advantages and disadvantages for both methods. From the simple effectiveness、accuracy, portability, we will recommend Bayesian approach. Especially when classified data dimension increase, the calculation complexity and time-consuming significant increase. In order to solve such difficult, the structure of these data is transformed into a hierarchical model, which can reduce the difficulty of the analysis, and easier to model. Statistical inference for the hierarchical model based on Rstan language is not only convenient effective, but also a general framework.
Keywords/Search Tags:R×C contingency, Approximate unconditional test, difference ofsensitivity, Wilson score interval, nonlinear dynamic factor analysis models
PDF Full Text Request
Related items