Font Size: a A A

A New Measure For Evaluating The Improvement Of Model Predictive Ability Based On Risk Stratification

Posted on:2016-04-01Degree:MasterType:Thesis
Country:ChinaCandidate:L Z ZhouFull Text:PDF
GTID:2284330482456692Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Background:Diagnostic and/or prognostic predicting models have been widely used in medical field, especially in clinical practice. Obtaining the best or relative best predicting model is the final objective of statistical modeling. Adding new covariate into the existed predicting model or establishing new model by using different variables as in the existed model is routine method to establish new model. The newly established model is then often compared to the existed one, so as to evaluate whether the new model can significantly improve predicting ability, and if yes, the old model is substituted by the new one. These steps are repeated until getting the ultimate model. For example, to evaluate the added predictive ability of a new biomarker, we usually add it to the existed model and evaluate the improvement of the diagnostic or prognostic predictive ability. Risk predicting based on risk stratification is one type of the widely used statistical models. It can be used in the settings such as the predicting of 10-years cardiovascular disease (Framingham model), or the developing of type 2 diabetes (QDScore model), or the developing of breast cancer (Gail model), et al. How to evaluate and measure the improvement of predictive ability of the new model as compared to the old one is a critical part of developing a risk predicting model based on risk stratification.A measure widely used to evaluate and compare the predictive performance of predicting models is the area under the receiver operating characteristic curve (AUC), which is also known as C-index or C-statistic. Although AUC is a useful measure, its low sensitivity has limited its use to compare the performance of predicting models. It is widely known that the improvement of AUC is limited after adding a new biomarker to an old model, even when the added biomarker is observed to have strong association with the outcome. Thus, other measures have emerged recently, among which included the widely cited measures such as net reclassification index or improvement (NRI) and integrated discrimination improvement (IDI) proposed by Pencina (2008). However, the statistical properties of the two measures have raised much criticism.As pointed out by Greenland (2008) that, like AUC, IDI is a global evaluation measure providing limited information, which makes it have little practical meaning. Besides, IDI is calculated as the difference of integrated differences in sensitivities and’one minus specificities’over the (0,1) interval between the new model and the old one. The weight of the integrating is assumed uniform over (0,1), however, this may be not appropriate in real world, because we may put more weight for some specific range for practical consideration. Moreover, the test for IDI=0 is the same as the test of β=0 for the added biomarker in the new model.NRI has also raised much criticism. The major drawback of NRI is that the practical meaning of NRI isn’t clear and it provides limited information. NRI is constructed based on risk stratification, assessing the risk category moving under the new predicting model and the old one, so as to evaluate how the new model improves the risk reclassification. As Kerr (2014) pointed out, NRI is composed by four rates, however, NRI itself doesn’t have the meaning of rate, because its value can be above 1 or below 0, thus, making it difficult to interpret. Moreover, when constructing NRI, it is only considered whether the risk category changes, but not how much it changes. Ignoring this important property may will lead to the losing of important information.In clinical practice, NRI based on risk stratification have more practice meaning than IDI and AUC. However, NRI has deficiencies for practice application, thus, new measure is needed.Objective:The objective of the present work is to develop a new measure to evaluate the improvement of predictive ability of the new model as compared to the old one, so as to give a new method for evaluating and comparing statistical models.Methods:A new measure is first proposed based on risk stratification and taken both whether the risk category changes and how much it changes under the new predicting model and the old one into consideration. Then, the statistical property of the new measure is explored, and the performance of it will be compared with other measures. At last, an example is given for illustration.The new measure:We define the new measure as average reclassification improvement (ARI), which is calculated as the difference in the average reclassification improvement for those who develop events minus average reclassification improvement for those who do not develop events. Assuming vi(i=1,2,...,K) is how much risk category changes, and P=(P1, P2,...,PK) is the corresponding probability, where K is the total combination of how much risk category changes. If there is H risk category, then there is 2H-1 type of risk category changing, and the changing range is -(H-1) to +(H-1). We further assume independence between events and non-event individuals, ARI is calculated as The sample estimate of ARI isThe sample variance var ARI can be estimated by var ARIevent and var ARInonevent. Following multinomial distribution, var ARIevent and var ARInonevent can be estimated as respectively. Assuming independence between events and non-event individuals and under the null hypothesis Ho:ARI=0, we arrive at a simple asymptotic test: The above test statistic is expected to be close to the standard normal distribution for sufficiently large sample sizes.Simulation study:Simulation setting included:Sample size (n):500,1000,3000;Overall proportion(PD):0.05,0.10,0.30;Odd ration of X(ORX):ORx=1.5,2,3,4,5,6,7,8,9;Odd ration of M(ORM):ORM=1,1.25,1.5,1.75,2,2.5,3,4,6,8;X was either a single previously known strong predictor or a linear combination of traditional predictor variables. M was a new variable that can potentially be added to a predictive model. X and M are generated from standard normal distribution N(0,1). All the simulations are based on logistic regression model, and 5000 simulation is conducted for each combination of those settings.Risk stratification:different sets of categories are considered, such as four categories (0 to <5%,5% to<10%,10% to <20% and 20%+) and three categories (0 to<5%,5% to <20% and 20%+).Results:The statistical property of the new measure is evaluated by standard deviation estimation and type one error.Standard deviation estimation. The standard deviation estimation of ARI is accurate when PD=0.05 or PD=0.1, however, when the overall proportion is high, e.g., PD=0.3, then the standard deviation is under estimated.Type One Error. Under that the risk stratification is 4 categories, when PD=0.05, the normal approximation of the ARI isn’t sufficient, while the normal approximation is sufficient when PD=0.1 or PD=0.3. When the parameters is set at relatively extreme setting, e.g. PD=0.05 and n=500, type one error of ARI ranges from 0.0171 to 0.0380, less than the pre-defined significance level 0.05; under other settings, type one error ranges from 0.0329 to 0.0621, and it is well controlled except for some settings. The results of 3 risk stratifications are similar to that of 4 risk stratifications.The methods NRI compared to include NRI and AUC.Comparison with NRI.Type One Error. For most simulation settings, type one error of NRI and ARI are well controlled, all falling in the acceptable interval. For the relatively extreme setting, e.g. PD=0.05 and n=500, type one error of NRI and ARI ranges from 0.0269 to 0.0558 and 0.0259 to 0.0558, indicting the two measures are conservative under this situation.Statistical Power. Under the settings of PD=0.05 or PD=0.1, ARI gets the same statistical power as NRI does; when PD=0.3, the statistical power of ARI is slightly higher than that of NRI by 0.2% to 1.6%, with average improvement of 0.5%.Comparison with A UC.Type One Error. DeLong’s test for two correlated ROC curves is always conservative.Statistical Power. Under the settings of PD=0.05 or PD=0.1, ARI gets almost the same statistical power as DeLong’s test does, thought the power of ARI will be lower under some situation, e.g., n=1000 and PD=0.1; when PD=0.3, the statistical power of ARI is higher than that of DeLong’s test by 5.7% on average.An example.The example is to evaluate whether urinary angiotensionogen (uAGT) and/or urinary albumin creatinine ratio (UACR) can improve risk stratification risk of developing AKI for Acute decompensated heart failure (ADHF) patients. AUC of M0 (including age, gender, chronic kidney disease, serum albumin, N terminal brain natriuretic peptide and neutrophil gelatinase associated lipocalin) is 0.814. The AUC of M1 (MO+uAGT) is 0.874, significant larger than that of M0 (DeLong test, P<0.001). The AUC of M2 (M1+UACR) is 0.874, showing no improvement as compared to M1. Considering four risk categories, the NRI and ARI for M1 to M0 are 0.302 (P<0.001) and 0.423 (P<0.001), respectively. The interpretation of ARI is that after adding uAGT into the old model, the risk categories has improved 0.423 on average. However, the practical meaning of NRI isn’t clear. The NRI and ARI for M2 to M1 are NRI=0.0 (P=0.997), ARI= 0.0 (P=0.997), indicating that UACR can’t improve risk stratification.The data shows that uAGT can improve risk stratification, while UACR can’t, thus, M1 (MO+uAGT) is the relative best model.Conclusions:We proposed a new measure to evaluate the improvement of predictive ability of a new predicting model as compared to the existed one. The new established measure is based on risk stratification, and whether the risk category changes and how much it changes were both taken into consideration. From the simulation results, type one error for the new established measure is well controlled overall. The power of the new measure is almost the same or slightly higher than that of commonly used measure, such as NRI and AUC. The new measure is an alternative solution to the dilemma that the existed measures are not suitable while appropriate measure is needed.
Keywords/Search Tags:predictive model, risk stratification, reclassification, NRI, ARI
PDF Full Text Request
Related items