Font Size: a A A

Parameter Estimation Based On GEEs Of The Data From Family-based Cluster Sampling

Posted on:2009-10-17Degree:MasterType:Thesis
Country:ChinaCandidate:J ChengFull Text:PDF
GTID:2144360245977872Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
In the researches of family-based cluster sampling, the observations of interested variable of the same family members are correlated. This kind of data does not meet the independent requirement, which is essential in classical statistical methods. Special methods are needed to deal with it. If using the classical statistical method, the validity and statistical characteristics of gained parameter estimate were influenced, and indeed brought biases to statistical conclusion. WG Cochran once discussed the analysis of this kind of data. Due to the complexity of his formula, this method did not gain widely application. Generalized estimating equations(GEEs) was put forward by Liang & Zeger in 1986, which is an extension of generalized liner models that provides a unified approach to analysis of dependent data. It could construct models for the dependent variables with a flexible distribution, normal, binomial, Poisson distribution, etc.The intra-family correlation must be considered in parameter estimation for the data from the family-based cluster sampling. Simulation study was used to compare accuracy of estimation and confidence interval coverage of several methods for parameter estimating from the data of family-based cluster sampling. Methods were compared across data simulated with different patterns of correlations among family members, and different intra-family correlations coefficients. 2000 data sets were generated for each combination of design parameter values. The main contents of this study are listed as the following three aspects:(1) Assessmets of the GEE estimation of population meanSimulated family-based sampling data from Monte Carlo, which was normal distributed, was used to compare the confidence interval coverage probabilities under 95% nominal level of three mean estimation methods, those were GEEs estimation, formula method and direct calculation method.(2) Assessmets of the GEE estimation of population rateBased on the simulated family-based binary data, the coverage probabilities of the 95%CIs of population rate estimation from GEEs, formula method and logistic model, were compared.(3) Case studyFor illustration purpose, an example of epidemiological survey on hypertension and related factors was analyzed using GEEs, formula method and logistic model, respectively. The main results of this study are summarized as following:(1) Under the independent assumption of intra-family individuals, the coverage probabilities of the 95%CIs are close to the nominal level, which is 95%, based on all three methods direct calculation method (logistic model), formula method and GEEs.(2) When intra-family correlation is greater than 0, the coverage probabilities of the 95%CIs are still close to the nominal level when formula method or GEEs estimation are used. But the coverage probability based on direct calculation method (or logistic model) decrease along with the increase of the strength of the intra-family correlation, which is immune to not only to the family size but also to the structure and the strength of the intra-family correlation.(3) The lager the intra-family correlation is, the more necessary it is to use GEEs estimation or formula method to consider the non-independence, since the estimation of direct calculation method (logistic model) was then seriously distorted.Based on the above-mentioned, the following suggestions of parameter estimation of family-based cluster sampling data are given:(1) Due to the hierarchical structure of the data from family-based cluster sampling, using the conventional method based on the independent assumption, the gained confidence interval is narrow. Especially when the coefficient of intra-family correlation is lager, it is necessary to use suitable statistical methods to obtain reliable result.(2) Although the estimation results of GEEs and formula method are approximate, GEEs have more advantages: it could not only adjust covariates, but also have the support of many softwares such as SAS and STATA. Therefore, GEEs is recommended for parameter estimation of family-based cluster sampling data.(3) Limitation of GEE. If families are nested in larger clusters, such as communities, then the intra-community correlation should be considered. The hierarchical structure is a 3-level, community level, family level and individual level. A multilevel model is preferred in the situations more then 2-level. GEE only suitable for 2-level structure.
Keywords/Search Tags:family-based cluster sampling, parameter estimation, intra-family correlation, Generalized estimating equations (GEEs)
PDF Full Text Request
Related items