A Comparison Of Five Qualitative Data Statistical Analysis Methods

Posted on:2013-10-12

Degree:Master

Type:Thesis

Country:China

Candidate:X Guan

Full Text:PDF

GTID:2234330374961001

Subject:Epidemiology and Health Statistics

Abstract/Summary:

PDF Full Text Request

CMH test, meta-analysis, logistic regression model, log-linear model andweighted chi-square test are commonly used in analyzing qualitative data. In manycases, they can be used to analyze the same type of qualitative data. Some scholarsfind out that the result of the CMH test is inconsistent with that of the meta-analysiswhen they are used to analyze the central effect of multi-center clinical trials. Whenanalyzing the risk factors of some diseases, the results of the log-linear model and thelogistic regression are different from each other.Scholars at home and abroad have done some research on how to select anappropriate statistical analysis method, and which one obtains a more reliable result,such as the comparative study of the CMH test and the logistic regression model inanalyzing three-dimensional contingency tables with a nominative or an ordinalresponse variable, the comparative study of the testing powers of the CMH test andthe meta-analysis in dealing with q×2×2three-dimensional contingency tables ofmulti-center clinical trials. However, the current research on this subject is notcomprehensive enough, and the used evaluation criteria are often too simple.Considering the current situation, this paper adopts the Monte Carlo simulationto compare the above five methods in analyzing four types of three-dimensionalcontingency tables. The evaluation criteria includes the type I error rate, testing power,parameter estimate and mean square error. The purpose of this study is to helpresearchers better choose statistical analysis methods in dealing withhigh-dimensional contingency tables. As this paper chiefly investigates the above fivemethods, other methods which can also be used in some cases are not involved.The main content and conclusion of this paper are as follows.⑴To compare the type I error rates, the testing powers, the parameter estimatesand the mean square errors of the CMH test, the logistic regression model, thelog-linear model and the weighted chi-square test in analyzing q×2×2contingencytables.The type I error rates of the four methods are almost the same. When there is noempty cell in the contingency table, the four methods obtain the same testing power. The testing powers of the logistic regression model, the log-linear model and theweighted chi-square test decrease when there are empty cells. When the testingpowers of all the four methods are1, the logistic regression model has the largestparameter estimate and mean square error among the four, followed by the log-linearmodel; when the population parameter is relatively small, the parameter estimate ofthe logit estimation which belongs to the CMH test approaches the populationparameter the best and the mean square error is the smallest. When the populationparameter is relatively large, the parameter estimate of the weighted chi-square testapproaches the population parameter the best and the mean square error is thesmallest. When the interaction of the factors is included in the model, the logisticregression model and the log-linear model obtain the same testing power, and theparameter estimates of the two methods are close to each other.⑵To compare the type I error rates, the testing powers, the parameter estimatesand the mean square errors of the CMH test, the logistic regression model and thelog-linear model in analyzing three-dimensional contingency tables with nominativeindependent and dependent variables.The type I error rates of the three methods are almost the same. When there is noempty cell in the contingency table, the three methods obtain the same testing power.The testing powers of the logistic regression model and the log-linear models decreasewhen empty cells exist. The CMH test can not estimate the parameter of the model.When the testing powers of the logistic regression model and the log-linear model areboth1, the parameter estimate of the group effect of the log-linear model is closer tothe population parameter, while the parameter estimate of the logistic regressionmodel is higher than the population parameter. When the interaction of the factorsis included in the model, the parameter estimate of the logistic regression is theclosest to the population parameter.⑶To compare the type I error rates and the testing powers of the CMH test andthe logistic regression model in analyzing three-dimensional contingency tables withan ordinal response variable.The type I error rate and the testing power of the CMH test are higher than thoseof the logistic regression model. The reason is that the score test for the proportionalodds assumption must be performed before the logistic regression model is used toanalyze the three-dimensional contingency table with an ordinal response variable. Ifthe result of the score test for the proportional odds assumption is ignored, the testing powers of the two methods are the same.⑷To compare the type I error rates, the testing powers, the parameter estimatesand the mean square errors of the CMH test and the meta-analysis in analyzing q×2×2contingency tables of multi-center clinical trials, and to compare the type I error ratesand the testing powers of the Breslow-Day test and the Q test during the homogeneitytest, and compare the statistic I~2which reflects the probability of correctly judgingheterogeneity, or the correct rate.The comparison result of the homogeneity test shows that the type I error rateand the testing power of the Breslow-Day test are higher than those of the Q test.When the number of the centers is small and the significance level is set to be0.05,the correct rate of the statistic I~2is higher than the testing powers of theBreslow-Day test and the Q test. As the number of the centers increases, the correctrate of the statistic I~2is lower than the testing powers of Breslow-Day test and the Qtest.The comparison result of testing power shows that in analyzing the contingencytable of which the population belongs to the fixed effect model, the CMH test has ahigher type I error rate but a lower testing power compared with the meta-analysisbecause the Breslow-Day test which belongs to the CMH test makes a higher type Ierror rate in the homogeneity test. When the homogeneity test results of theBreslow-Day test and the Q test are both negative, the CMH test and themeta-analysis achieve the same testing power in the testing power analysis. However,when the population belongs to the random effect model, the result of the CMH test isnot reliable; therefore the comparison of the two methods is not involved.When the population OR is small, the means of the parameter estimates of thelogit estimation and the MH estimation of the CMH test and the meta-analysis areclose to the population parameter, and the mean square errors are all small. As thepopulation OR increases, the parameter estimate of the MH estimation of the CMHtest is still close to the population parameter; however, the parameter estimates of thelogit estimation and the meta-analysis are smaller than the population parameter, andtheir mean square errors are relatively large.The comparative study suggests that we may take the the following aspects intoconsideration when selecting an appropriate statistical analysis method to analyzequalitative data.⑴Correctly determine the type of the data. Researchers need to observe and analyze the variables of the data and their characteristics.⑵Fully understand the source of the data. If the data is from amulti-center clinical trial, the CMH test or the meta-analysis can be adopted.⑶Observe the frequencies of the cells. When there are empty cells in thehigh-dimensional contingency table, add0.5to each cell before analysis.⑷Select a method which is relatively simple on the analysis process or the result.If the hypothesis test is performed only, the CMH test, whose testing power is is aspowerful as the other methods, is an effective and simple method. As to q×2×2contingency tables of multi-center clinical trials, the CMH test and the meta-analysisshould be used together for analysis since the Breslow-Day test achieves a highertesting power and makes a higher type I error rate than the Q test in analyzing theheterogeneity. If the parameter estimation is also required, researchers should selectstatistical analysis methods based on the type of data and the above comparisonresults. As to the general q×2×2contingency tables, if only one factor effect isrequired to be estimated, the logit estimation of the CMH test is recommended. Withregard to the high-dimensional contingency tables with nominative independent anddependent variables, though the log-linear model can better estimate the experimentaleffect, the result is more comprehensive compared to that of the logistic regressionmodel. Therefore, the logistic regression model is recommended when theindependent and dependent variables of the data are clear. In terms of the q×2×2contingency tables from multi-center clinical trails, the MH estimation of the CMHtest is recommended to be adopted only when the result of the homogeneity test isnegative. If the heterogeneity exists among the multiple centers, the meta-analysis isrecommended to be used and the random effect model may be adopted to estimate theexperimental effect.

Keywords/Search Tags:

CMH test, meta-analysis, logistic regression model, log-linearmodel, weighted chi-square test, Monte Carlo simulation, type I error, testing power, parameter estimate, mean square error

PDF Full Text Request

Related items

1	The Study Of EDC Building And Simulation, Forecast & Evaluation Methods For Adaptive Clinical Trials
2	Effect Of Difference Censored Rates Between Groups In Clinical Trails
3	Evaluation Of The Comparative Methods Of Independent Samples
4	Modeling And Analysis Of Motion Error Of Master-slave Surgical Robot
5	Comparative Simulation Study On The IPCW Method In One Way Crossover Design Survival Data
6	Causal Mediation Analysis Of Survival Outcome With Multiple Mediators
7	Investigation Analysis And Model Selection Of The Prevalence Of Hypertension In Lanzhou City
8	A Simulation Study Of Logistic Regression And Rare Events Logistic Regression Model
9	Construction And Application Of SNP Microarray Database And The Related Analysis Tools
10	Research And Practice Of Statistical Analysis In Disease Monitoring