Font Size: a A A

Mean And Standard Deviation Estimation From Five Number Summary

Posted on:2019-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:J D ShiFull Text:PDF
GTID:2394330542499896Subject:Statistics
Abstract/Summary:PDF Full Text Request
Meta-analysis is becoming increasingly popular during the past two decades,mainly due to its wide applications in evidence-based medicine.To statistically combine data from multiple studies,meta-analysis is an easy and proper choice.Considering that meta-analysis can only handle summary statistics such as the mean and standard deviation,one usually needs to conduct a systematic review and extract the summary data from the clinical studies in the literature.For continuous outcomes,e.g.,.the high blood pressure and the amount of alcohol consumed,the sample mean and standard deviation are two commonly usedstatistics reported for evaluating the effectiveness of a certain medicine or treat-ment.We also note that,for certain reasons,the sample median,the first and third quartiles,and/or the minimum and maximum values are also frequently reported in the literature.To the best of our knowledge,no existing methods in meta-analysis can handle the sample median and the sample mean simultaneous-ly.As one example,when applying the fixed-effect model or the random-effects model,we are allowed to use only the sample mean and standard deviation data to derive the overall effect size of the treatment.Then,a natural question is:how to deal with the studies in the literaturewith the sample median data?In the early stages,researchers often exclude such studies from further analysis by claiming them as "Studies with no sufficient data"in the flow chart of study selection.Such an approach,however,is suboptimal as it may lose valuable information in the literature.As a consequence,the final results are usually less reliable,in particular when the total number of studies is small or a large proportion of those studies are reported with the sample median data.For this,there is an increased demand for developing new methods that convert the sample median data to the sample mean data for meta-analysis.Forease of notation,let {a,q1,m,q3,b} denote the five number summary,where a is the sample minimum,q1 is the first quartile,m is the sample median,q3 is the third quartile,and b is the sample maximum of the data.Let also n be the sample size of the study.Note that the five number summary may not be fully reported in clinical studies.In the special case where only {a,m,b} were reported,Hozo et al.was the first to provide a simple method for estimating the sample mean and standard deviation.It is noted,however,that Hozo et al.did not sufficiently use the information of sample size n so that their estimators are either biased or non-smooth.Inspired by this,to improve Hozo et al.’s method,Wan et al.proposed an unbiased estimator of the standard deviation,and Luo et al.proposed an optimal estimation of the sample mean.In another special case where only {q1,m,q3} were reported,Wan et al.also proposed a simple estimator of the sample mean and an unbiased estimator of the standard deviation.Luo et al.further improved the sample mean estimator in Wan et al.by providing the optimal weights between the two components of the estimator using the sample size information.In Google Scholar on 13 March 2018,Hozo et al.has been cited 2254 times and Wan et al.has been cited 301 times.Without any doubt,these several papers have been attracting more attentions and playing an important role in meta-analysis.When {a,q1,m,q3,b} were fully reported,Bland proposed some new estimators for the sample mean and standard deviation from the five number summary.As his method is essentially the same as Hozo et al.’s method,Bland’s estimators are also suboptimal and the sample size information is not sufficiently used.For instance,the sample mean estimator in Bland is Given that the data follow a symmetric distribution,the quantities(a + b)/2,(q1 + q3)/2,and m can each serve as an estimate of the sample mean.To have a final estimator,Bland applied the artificial weights 1/4,1/2,and 1/4 for the three components,respectively.That is,the first and third components are treated e-qually and both of them are only half reliable compared to the second component.As this is not always the truth,to improve the sample mean estimation,Luo et al.proposed the optimal estimator as where ω1=2.2/(2.2 + n0 75)and ω2=0.7-0.72/n0.55 are the approximated optimal weights assigned to the respective components.For the standard deviation estimation from the five number summary,Bland also provided an estimator by the inequality method as in Hozo et al.This estimator does not include any information from the sample size.The sample size actually is an important index to measure the credibility of a set of data.The statistics such as median and extreme order statistics are approaching their true values under a certain distribution as the sample size n tends to infinity.In such case,the estimators consisting of these order statistics are more credible.However,when the sample size is small,e.g.n = 5,some robust statistics even the median will have large variances.It is important to consider the information from the sample size in the estimators,especially for medical data.Noting this problem,Wan et al.combine two unbiased estimator of standard deviation with the equal weight as where ζ= ζ(n)= 2Φ-1[(n-0.375)/(n + 0.25)],η= η(n)= 2Φ-1[(0.75n-0.125)/(n + 0.25)],Φ is the cumulative distribution function of the standard normal distribution,and Φ-1 is the inverse function of Φ.Although this estimator takes the information of the sample size into consideration,noting that the range term and interquantile range term in the estimator can not be always equally reliable with the change of the sample size,the equal weight in the estimator is not reasonable(see also detailed reasons in Chapter 3).According to Higgins&Green and Chen&Peace,the standard deviations play a crucial role in weighting the studies in meta-analysis.Inaccurate weighting results may lead to biased overall effect sizes and biased confidence intervals,and hence mislead physicians to provide patients with unreasonable or even wrong medications.Noting the above problems,we refer to the weighted methods in Luo et al.to conduct a new standard deviation estimator.By minimizing the square loss function to determine the optimal weights,we conduct an analytical form of the optimal weights,which is only a function of the sample size n.Applying the optimal weights into the estimator,we conduct the optimal standard deviationestimator.Considering that it is not robust enough to use the square loss function,we also apply the absolute value loss function to determine the optimal weights for better robustness.We conduct nearly equal optimal weights using these two methods.Recall that the estimator is a weighted combination of two unbiased estimators which are also symmetrically distributed around the true value of the standard deviation.So is the weighted estimator.Thus it is reasonable to conduct the equal weights from these two methods.It has also been proved that the optimal weights will approach 0 as n tends to infinity.In other words,the weights assigned to the range term decrease with the increasing of the sample size and eventually tend to 0.This coincides with the instability and incredibility of the extreme order statistics.When the sample size is large enough,researchers should only rely on the information of the interquatile range to estimate the standard deviation.Meanwhile the unbiasedness of the standard deviation estimator is also given.Note that there is no analytical form of the optimal weights conducted by absolute value loss function method and the analytical form given by square loss function method is also complicated and not easy to apply in practice.In Chapter 5,we conduct some simulation studies to highlight the advantages of our new estimator.When data are normally distributed,our methods take absolute advantages over Wan et al’s method by achieving smaller relative mean square error(RMSE).When data are non-normally distributed or even skewed,our methods are also generally better than Wan et al.’s method.Real data analysis are given in Chapter 6,where the estimates given by our method are closer to the true values.The conclusions are given in Chapter 7 and we discuss further development and some related problems in Chapter 8.
Keywords/Search Tags:Meta-analysis, mean estimation, standard deviation estimation, range, interquartile range, five number summary
PDF Full Text Request
Related items