Font Size: a A A

Comparison Of Generalized Linear Mixed Models (GLMM) And Logistic Regression In Complex Sampling For Analyzing Data Obtained From Stratified Cluster Random Sampling

Posted on:2011-08-07Degree:MasterType:Thesis
Country:ChinaCandidate:D P ChenFull Text:PDF
GTID:2154360305998096Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
ObjectivesTo study the applications of generalized linear mixed effects models and logistic regression in complex sampling in stratified cluster random sampling, by analyzing an example and doing the simulation to compare with traditional logistic regression,combined with the corresponding modules,SURVEY module,GLIMMIX module and LOGISTIC module in SAS. Exactly, compare three methods for different scenarios by changing the simulation parameters, such as intra-class correlation coefficient (ICC), given coefficients for the variable at individual level and sampling.MethodsFirstly, we use the three statistical methods to analysis the example, and the results can provide parameters for the simulation. There are two parts in simulation: part 1, to simulate the original population for the example, and do stratified cluster random sampling for 1000 independent replications, for each time using logistic regression, logistic regression in complex sampling and generalized linear mixed effects models respectively; part 2, based on the simulation in part 1, changing simulation parameters:the ICC for different stratums, given coefficients for individual variable and sampling, also with 1000 independent replications, for each time also using logistic regression, logistic regression in complex sampling and generalized linear mixed effects models respectively. Performance measures that are used include an assessment of bias,the type I error rate, coverage of 95% confidence interval, power and standard error.ResultsIn.the analysis of the example, when surveylogistic and GLIMMIX were used, the standard errors of the regression coefficients were larger than those from logistic regression. The 95% confidence intervals of the ORs were wider.In the analysis of simulation, surveylogistic and GLMM mainly affects the variables at group level. Totally, GLMM performes the strongest controling in typeâ… error. And at group level, surveylogistic keeps consistency with GLMM in typeâ…  error. But at individual level, it performes the worst of the three methods in type I error. When the ICC in the bigger stratum is 0.1, with 0.5 in the smaller one, and the probabilities of the sampling in both two stratum are the same, the application of surveylogistic at individual level performs the worst in the type I error rate. The type I error rate at group level increases with increasing of the ICC in the bigger stratum.The coverage of 95% confidence interval is the highest in GLMM both at the individual level and group level. And logistic regression performs the worst.Meanwhile, the coverage of 95% confidence interval is also affected by the probabilities of the sampling and given coefficient for the variable at individual level.There is an interaction between ICC in the bigger stratum and different statistical methods to affect the coverage of 95% confidence interval. The different impacts of the bias are not significant within three statistical methods both at group and individual level. GLMM gives the most impact on bias.ConclusionsThe application of the traditional logistic regression in stratified cluster random sampling is not suitable. We could consider two respects to analyze the data obtained by stratified cluster random sampling:at individual level, the best method is GLMM. And if the ICC both in the bigger and smaller stratum are small, the traditional logistic regression could be used. The logistic regression of complex sampling shouldn't be used. However, at group level, the standard error of parameters caused by the traditional logistic regression will be underestimated, making its significant test too loose. So GLMM and the logistic regression of complex sampling are more applicable, but considering its simple calculation and time-saving, it is recommended to use the logistic regression of complex sampling within the complete informations on the sampling frame.
Keywords/Search Tags:stratified cluster random sampling, surveylogistic regression, generalized linear mixed effects models
PDF Full Text Request
Related items