Font Size: a A A

Application And Evaluation Of Excel Programs In Epidemiological And Statistical Data Analysis

Posted on:2010-07-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q S ChenFull Text:PDF
GTID:1224360278974755Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
With the developments in science and technology,epidemiological-statistical analysis methods and calculating instruments(including computing hardware and software) are constantly improving,and along with a wide range of application,they have an ever more obvious role in the field of medical research and indeed could be considered virtually indispensable.Calculation methods,new calculation theories and sophisticated calculation models are being published and improved continuously, such as evidence-based medicine theory,the method for meta analysis,logistic regression models.As a means of calculation,computer and computer software technologies have developed significantly and at present,the most famous software applications are SAS and SPSS.These applications have been developed and since their introduction have become the world’s most popular and authoritative analysis software.They are constantly updated and upgraded.Their advantage lies in merging data management and data analysis into an integral whole,achieving data exchange, data management and data analysis.However,their most prominent feature is the data analysis function,including the ability to calculate a variety of indicators,different statistical methods of analysis and the number of available charts.Because of the SAS and SPSS sales strategy,the whole interface in English,in particular,the very large statistical analysis contents and function,presents difficulties for the majority of workers in medical research.In addition,some of the commonly used epidemiological data analysis methods,such as sample size calculation,RSR,circular distribution,sequential testing,Delphi method and meta analysis in SAS and SPSS do not have a readily available calculation module and the need for additional programming causes much inconvenience to users.Microsoft Excel is one branch of electronics table systems of the Microsoft Company under the Windows environment.As one member of Office to transact software,it is currently applied more extensively to form processing.Compared with professional statistics software,Excel can manage database,process data and make charts.Its characteristics lie in having more extensive applications,as it is bind-saled with Office software,exchanging Chinese and English smoothly,with simple operation and intuitive browsing;displaying the output simultaneously as the data is inputted.For example,in our country,Yong-Feng Li,Yong-Jian Hao analyzed the incidences of tuberculosis from 2001 to 2005 in a number of troops.Liang analyzed the epidemiological feature of SARS in the Dongcheng District of Beijing.Jun-Hong Sun also described the incidences of infectious diseases.Cai-Hong Huang analyzed the age characteristics of measles suffers in Quanzhou.Guo Zhao analyzed the epidemic outbreak of varicella in Pingdingshan in 2006.Tai-Hao Liu analyses trends of polio in Laiwu.Xing-Guo Li forecasts the trend of viral hepatitis in the Guangzhou Military Region.Qing-Shan Chen finished the forecast and evaluation of sexually transmitted diseases with the GM model.Wei-ping Luo analyzed the cause of death and potential years of life lost among residents in five counties of Xinjiang.Yun-jie Dai analyzed the causes of death of 794 inpatients over the age of 60.Fang Fang analyzed the spectrum of diseases in new rural cooperative medical disease patients in Nansha District,Guangzhou.Abroad,Korochina screened the patients with metabolic syndrome and with suspected rheumatic disease.Tovar analyzed the incidence of gastric cancer in Mexico from 1980 to 1997.Mousavi analyzed the causes of breast cancer of women in Tehran,the capital of Iran.Ngoan analyzed the spectrum of causes of death in cancer patients in Vietnam from 2005 to 2006.There was also some research which analyzed the relationship between diseases and related factors with Excel’s programs,such as,stomatitis and drinking water quality,macular disease and nutritional status,stomach and rectal cancer and environmental factors, congenital malformations in preterm infants and the size of the thyroid of their mothers,and so on.Specific methods of calculation,that can be done in Excel include:①design of experiments,for instance sample size estimate.②descriptive statistics,such as maximum,minimum,central tendency and dispersion tendency.③statistical analysis,including two methods.The first application of data analysis tools facilitates the creation of statistical chart,curve fitting,weight coefficient calculation, dose-response data analysis,decomposition of time series factor,t test,u test,analysis of variance,correlation and regression analysis,multiple linear regression,chi square test,spearman rank correlation,fourier analysis and so on.The preparation of the statistical analysis programs include life table,circular distribution analysis,path analysis,equivalent test,multiple correlation analysis,survival analysis,meta analysis,clustering analysis(fuzzy clustering analysis and R clustering analysis),gray model,sequential test,discriminant analysis,trend forecast,factor analysis,Cox regression model,and logistic regression.For this reason some scholars have pointed out that Excel can be used as a new tool in clinical statistical analysis.The article describes some Office software programs and Excel functions and formulas,etc,which can be commonly used in epidemiology statistical analysis and some of advanced methods which may be used to provide many medical researchers with a convenient and intuitive Chinese calculation and analysis tool.Obviously issues can be resolved using more authoritative statistics software but these are complicated to operate,inconvenience and difficult to popularize and promote without a Chinese language version.Most epidemiological researchers need not understand the complicated calculation process,as long as they can select specific analytical methods by which data can be analyzed and they can get the necessary result immediately.The aim of the research is to simplify the operation interface,make accurate results,adapt to the mass.It will provide a basis for the subsequent further development of related software and applications.In this paper,the following programs of epidemiology and statistical analysis have been done in Excel:The Excel descriptive statistics program:calculation of mean,standard deviation, the number of cases,maximum,minimum,range,median,geometric mean, coefficient of skewness,coefficient of kurtosis,ug1,ug2,D value of normal test, coefficient of variation,P2.5,P25,P75,P97.5,95%confidence interval(95%C.I.L~95% C.I.U);converting raw data into frequency distribution table,and drawing frequency distribution chart.The Excel t test program:comparing the population mean with the sample’s mean,comparing the means of pairing samples,and comparing the means of two samples.The Excel analysis of variance program:analysis of variance of completely randomized design and analysis of variance of randomized block design.The Excel chi square test program:contingency table,McNemar test, Mantel-Haenszell in case-control study,Mantel-Haenszell in cohort study and R×C chi square test. The Excel Ridit analysis program:two groups and more of rank data.The Excel linear correlation and regression model program:linear correlation analysis,linear regression analysis(Pearson correlation) and Spearman rank correlation analysis.The evaluation of diagnostic tests program:calculation of the sensitivity, specificity,false positive rate,false negative rate,Youden’s index,likelihood ratio, agreement rate,predictive value,and drawing of ROC curve.The Excel sample size estimating program:sampling estimate of the population rate,hypothesis testing of the single population rate,hypothesis testing of completely randomized design,hypothesis testing of the population rates designed by matching, sampling estimates with the population mean,hypothesis testing of the single population mean designed by matching hypothesis test of the two population means by completely randomized design,sample size estimate of case-control studies and cohort studies.The Excel experimental design programs of random allocation:complete randomization,paired randomization,and block randomization.The Excel Life table program:abridged life table,life table to the cause of death, and Life Expectancy Table.The Excel gray model program:differential equation model of one band,one variable,that is GM(1,1) model.The Excel meta- analysis program:meta analysis of combined u,t,F,χ2,P value, the combined count data and the measurement data.In this paper,we compared,using examples,the calculation results of the above excel programs with those of SAS,SPSS,textbooks and literature.Excluding the rounding factor,the excel results were the same with those of SAS,SPSS or textbooks,in statistical describing(Mean、standard deviation、sum、maximum、 minimum、range、median、geometric mean、skewness、kurtosis,ug1、ug2、D value of normal test、coefficient of variation、P25、P75 and 95%confidence interval),T test(T value and P value),Analysis of variance(F value、P value and multiple comparison),χ2 value and P value of chi-square test(including chi-square test、corrected chi-square test and fisher’s exact test,u value、χ2 value and P value of Ridit analysis,linear regression and linear correlation(correlation coefficient、regression coefficient、regression coefficient hypothesis testing),diagnostic tests(sensitivity、specificity、false positive rate、false negative rate、Youden’s index、positive likelihood ratio、negative likelihood ratio、agreement rate、positive predictive value、negative predictive value、area under the curve,and the ROC curve),sample size(estimating the population rate in survey sampling、the comparison of sample rate and population rate、the comparison of two-sample rates in completely random design、estimating the population mean in survey sampling、the comparison of a single population mean in paired design、the comparison of two population means in completely random design、case-control study、cohort study),life-table(life expectancy),meta Analysis(P value、u value、t value and F value).However,the calculation of OR and RR value, et al,together with standard error,was different from those in textbooks or in RevMan.The reasons for this have not yet been verified.There was no standard for the calculation of the GM(1,1) model.Therefore we selected published literatures as references.Consequently,Many statistical methods could be applied in Excel as a program, including statistical description(Mean、standard deviation、sum、maximum、minimum、range、median、geometric mean、skewness、kurtosis、ug1、ug2、D value of normal test、coefficient of variation、P25、P75、95%confidence interval、frequency table and graph of frequency distribution),one-sample T test,paired-samples T test, independent-samples T test,one-way ANOVA,ANOVA of randomized block design, chi-square,continuity correction and fisher’s exact test;ridit analysis;linear correlation analysis,linear regression analysis(Pearson correlation) and rank correlation analysis;diagnostic tests,including sensitivity、specificity、false positive rate、false negative rate、Youden’s index、likelihood ratio、agreement rate、predictive value and ROC curve.The estimation of sample size of the population rate in sampling,the comparison of sample rate and population rate,the comparison of two sample rates in completely random design,estimation of the population mean in survey sampling,the comparison of a single population mean in paired design,the comparison of two population means in completely random design,case-control study, cohort study.Group in complete randomization,in paired,in block.Current life table, cause eliminated life table and active life expectant table.GM(1,1) model and Meta Analysis.The calculation results of the Excel program were basically consistent with the international statistical software,SAS and SPSS.This Excel program had a brief interface,operating simply.Thus,it could be widely used among the medical workers.Therefore,a feature of this paper is the documenting of accurate,convenient, intuitive programs which permit the exchange of Chinese and English smoothly through Excel functions and formulas,making up for the deficiencies of authoritative statistical software without Chinese version,the operation of which is too complex and inconvenience to promote or popularize.Some specific points should be noted in the applications①calculation should not be too complicated.In the calculation of logistic regression which deals with single factor,two factors or three factors,the program of logistic regression was constrained to a large degree because of iteration in which calculation of the workload increases geometrically with the increase of factors.So,the use professional statistics software is recommended to solve particularly complex issues. ②Rounding has an impact on results.The number of decimal places was set up within the format cell and can only be displayed in cell "value" rounded.Yet the "value" in computing has not been rounded up.This results in content which is inconsistent with the calculation.③VB programming.VB is the world’s most widely used and most efficient programming language,which has had powerful developments,has reliable performance,and practical small programs which can deal with practical problems,with fast,simple and effective features and can solve many problems that Excel itself can not solves.④programss "drop" can extend the issue of calculation.In the preparation of the programs,generally the entering space and the median calculation space need be reserved.But under special circumstances,we can complete the preparation of median programss and the calculation of results by the autofill "selected" and "drop-down" in the soft.⑤The existent issues to be resolved.We can not complete its programming without the calculating formula of SNK.Therefore,if we take q cutoff time programming table method to determine the cutoff of value q,the process seems to be a little tedious.In the calculation of exact probability,when the total number of cases was less than 40,the 20 different combinations tables were listed.When the "n" was larger,there might exist the overflow problem of calculation with the exact probability method.
Keywords/Search Tags:Excel, Epidemiology, Data, Statistics analysis
PDF Full Text Request
Related items