| Background:When evaluating the efficacy of drugs in clinical trials,the endpoint with the best sensitivity to detect treatment effects and clinical relevance to goals of the trial will be called the true endpoint.However,true endpoint might be difficult to use in practice due to its weaknesses including requiring long follow-up time.To solve this contradiction,we need an endpoint,which is measured easier,earlier,and more conveniently,to replace the true endpoint.Such replacement endpoint is termed surrogate endpoint.It could cut the followup time,reduce missing rate,and improve the efficacy and reliability of the trial.Although the experience of early failure to use surrogate endpoints has led to persistent skepticism,the increasing demand for accelerated approval of drugs has led to widespread use of surrogate endpoints,especially in clinical trials of antineoplastic drugs.The National Medical Products Administration(NMPA),the Food and Drug Administration(FDA),and the European Medicines Agency(EMA)all encourage the use of surrogate endpoints in clinical trials to expedite drug development.However,not all surrogate endpoints translate into long-term benefits of the true endpoint,which also poses new challenges for surrogate endpoint evaluation.A large number of surrogate evaluation methods have emerged in nearly 20 years.Prentice first proposed the statistical definition of surrogate endpoints and also proposed the Prentice criterion.Freedman then proposed proportion of treatment effect explained(PTE)as the first indicator to quantify the surrogacy.Buyse and Molenberghs further proposed relative effect(RE)and adjusted association(AA).However,these evaluation methods based on single clinical trial often require a very large sample size to ensure the stability of the evaluation.Therefore,Buyse et al.proposed surrogate endpoint evaluation methods based on multiple clinical trials to estimate trial-level Rt2 and individual-level Ri2,respectively,which is now a more recognized surrogate endpoint evaluation system.Based on multiple trials,Molenberghs et al.proposed methods to evaluate binary and ordinal endpoints,Burzykowski et al.proposed methods to evaluate survival endpoints,and Alonso et al.proposed methods to evaluate repeated measures endpoints.However,there is only one study for multiple trial evaluation methods for longitudinal surrogate endpoints and survival true endpoints,and there is no software implementation method available.Therefore,based on multiple trials,this paper further explores the evaluation methods for single and multiple longitudinal surrogate endpoints and establishes the evaluation software,providing the application scheme.In addition,Alonso et al proposed likelihood reduction factor(LRF)for evaluating individuai-level surrogacy.Alonso and Molenberghs also propose trial-level Rht2 and individual-level Rhind2 based on information theoretic approach.Methods based on causal inference are also beginning to emerge in the evaluation of surrogate endpoints.Objective:The purpose of this study is to establish a surrogate endpoint evaluation method applicable to multiple trials based on PTE and explore its evaluation performance.At the same time,based on the joint model,to estimate the trial-level Rt2 and individual-level Ri2 of the longitudinal surrogate endpoint and the survival true endpoint and evaluate the performance under different scenarios.In contrast,the evaluation method of multiple longitudinal surrogate endpoints is established in multiple trials to make up for the study gap.Secondly,this study is going to establish an online surrogate endpoint evaluation system based on the proposed surrogate endpoint evaluation methods and indicators,lowering the threshold of use,so that the surrogate endpoint evaluation no longer stays in theory,but can be truly applied to practice and popularized.Finally,based on the proposed surrogate endpoint evaluation methods and indicators,this study will evaluate the surrogacy of progression-free survival(PFS)on overall survival(OS)in advanced ovarian cancer,as well as the surrogacy of longitudinal minimal residual disease(MRD)and longitudinal treatment response on PFS in multiple myeloma,providing the basis for the selection of trial endpoints for the subsequent clinical trials of advanced ovarian cancer and multiple myeloma.Methods:First,for two different data types,"continuous surrogate endpoint-continuous true endpoint" and "longitudinal surrogate endpoint-survival true endpoint",the simulated datasets are established based on the data scenarios of "number of trials","sample size of trials","trial-level Rt2","individual-level Ri2","heterogeneity of αi"and "heterogeneity ofβi" permutations.Second,for continuous surrogate endpoints,bivariate mixed-effects model,bivariate fixed-effects model,univariate mixed-effects model,univariate fixedeffects model,as well as their full,semi-reduced,and reduced models were constructed to estimate the trial and individual-levels.Meanwhile,median of proportion of the treatment effect explained(MPTE)and adjusted median of proportion of the treatment effect explained(ADJMPTE)are established.Next,for single longitudinal surrogate endpoints,a j oint model,a Bayesian joint model was constructed for the estimation of MPTE and ADJMPTE,and a random effects joint model was constructed for the estimation of trial-level Rt2 and individual-level Ri2.For multiple longitudinal surrogate endpoints,a multivariate Bayesian joint model was constructed to estimate MPTE and ADJMPTE,and a multivariate random effects joint model was constructed to estimate trial-level Rt2.Subsequently,the constructed model was applied to the simulated dataset,observing the performance of the model to evaluate surrogate endpoints under various data scenarios.After the simulation test,the constructed model was embedded in an online surrogate endpoint evaluation system.Finally,the constructed continuous surrogate endpoint evaluation model was applied to a study of ovarian cancer to evaluate the surrogacy of PFS on OS.The constructed longitudinal surrogate endpoint evaluation model was applied to multiple myeloma studies to evaluate the surrogacy of longitudinal MRD and response on PFS.Result:In the simulation study,when both the surrogate endpoint and the true endpoint are continuous variables,the accuracy and change trend of the evaluation indicators of each surrogate endpoint under different number of trials,sample size of trials,trial-level Rt2 and individual-level Ri1,and heterogeneity of αi and β1 are discussed,respectively.Among them,the MPTE and ADJMPTE estimated by simple linear regression and bivariate mixedeffects reduced model were similar,and both methods were considered suitable.The estimation of trial-level Rt2 and individual-level Ri2 in each scenario is accurate,especially with sufficient number of trials and sample size of trials.When the equal heterogeneity inαi and βi,MPTE does not bias towards Rt2 or Ri2,but contains information on both,and its estimates are between Rt2 anu Ri2.Therefore,MPTE is considered appropriate for the evaluation of surrogate endpoints and reflects both trial and individual-level information,which is more comprehensive and accurate than using Rt2 or Ri2 evaluating alone.However,when the unequal heterogeneity in αi and βi,MPTE is biased and no longer appropriate for evaluation of surrogate endpoints.While the corrected ADJMPTE value is between Rt2 and Ri2,which can accurately reflect the surrogacy.In the simulation study,when the surrogate endpoint is longitudinal outcome and the true endpoint is survival outcome,the accuracy and trend of each surrogate endpoint evaluation indicator under different trial numbers,trial sample sizes and trial-level Rt2 are discussed,respectively.Among them,the MPTE and ADJMPTE estimated by the joint model and the Bayesian joint model were close,and both models were considered suitable.In the simulation study of a single longitudinal surrogate endpoint,the random effects joint model has higher estimation accuracy on Rt2,the 95%confidence interval(CI)is wider when the number of trials is small,and the absolute bias is also larger.As the number of trials as well as the sample size of trials increased,the accuracy was higher.In the longitudinal surrogate endpoint evaluation,Ri2 of each trial is a time-varying curve,and the true value of Ri2fluctuates between 0.97 and 1.00 according to the parameter setting.As the true value of Rt2 become larger,the values for both MPTE and ADJMPTE become larger,and the trend remains consistent.However,in numerical terms,MPTE is low,while ADJMPTE is between Rt2 and Ri2,which accurately reflects the surrogacy.In simulation studies with multiple longitudinal surrogate endpoints,the accuracy of Rt2 estimated by multiple random effects joint model was high.As it gets larger,both MPTE and ADJMPTE get larger,and the trend remains consistent.Numerically,MPTE and ADJMPTE are similar,possibly because the heterogeneity of β1i,α2i and βi are close in the simulated data.Although MPTE and ADJMPTE are similar and reflect the true alternative effect,it is still more recommended to use ADJMPTE for surrogate endpoint evaluation based on the simulation results of continuous surrogate endpoints and single longitudinal surrogate endpoints,because MPTE may not accurately reflect the surrogacy of surrogate endpoints when the heterogeneity of α1i,α2i and βi becomes larger.The online surrogate endpoint evaluation system built for this study has been successfully released to support the evaluation of surrogate endpoints in six different scenarios,including:"Single Trial-Continuous-Continuous","Single Trial-Single Longitudinal-Survival","Single Trial-Multiple Longitudinal-Survival","Multiple Trials-Continuous-Continuous","Multiple Trials-Single Longitudinal-Survival",and"Multiple Trials-Multiple Longitudinal-Survival".Users can complete the evaluation of surrogate endpoints online only by referring to the example data format and uploading and defining the corresponding variable name according to the page prompt.The system supports not only web-side use,but also mobile phone and tablet access use.In the case study of advanced ovarian cancer,continuous surrogate endpoints were evaluated for log(PFS)and log(OS)with Rt2 between 0.868 and 0.917,Ri2 between 0.886 and 0.888,MPTE=0.948 and ADJMPTE=0.865.The evaluation of survival surrogate endpoints for PFS and OS,taking into account both time variables and censor status,was based on an information theoretic approach and resulted in trial-level Rht2=0.9184(95%CI:0.8674,0.9695),individual-level Rhind2=0.7446(95%CI:0.7152,0.7720),and individuallevel adjustment Rhind.qf2=0.8193(95%Cl:0.7928,0.8433).The results of both methods were consistent,and PFS was considered a qualified surrogate endpoint for OS in advanced ovarian cancer.Longitudinal MRD as a single longitudinal surrogate endpoint for PFS in in multiple myeloma trials has been evaluated with MPTE = 0.377 and ADJMPTE=0.978,Rt2=0.274.Longitudinal response,as a single longitudinal surrogate endpoint for PFS,gives MPTE=0.324 and ADJMPTE=0.939,Rt2=0.804.Combining longitudinal MRD and response as multiple longitudinal surrogate endpoints for PFS gives MPTE=0.706,ADJMPTE=0.971,Rt2=0.572.Both MRD and response were better evaluated as single surrogate endpoints,especially response,while having higher ADJMPTE and Rt2.Trial-level Rt2 of MRD,on the other hand,was lower,probably due to their lower mean number of visits and shorter visit duration.While MPTE and ADJMPTE were high in multiple longitudinal surrogate endpoint evaluations that combined MRD and response,the relatively low Rt2 may be related to the low Rt2 of MRD.Overall,MRD and response provide a favorable surrogate for PFS,both as single surrogate endpoints and in combination as surrogate endpoints.Conclusions:In conclusion,the MPTE and ADJMPTE proposed in this study have been demonstrated by simulation studies to be able to accurately evaluate surrogate endpoints for multiple trials.At the same time,the Rt2 and Ri2 on longitudinal surrogate endpoint estimated by random effects joint model are accurate.Rt2 and Ri2 on multiple longitudinal surrogate endpoints estimated by multiple random effects joint model are also very accurate.The MPTE and ADJMPTE estimated based on the Bayesian joint model and the multivariate Bayesian joint model can accurately evaluate the surrogate endpoints,except that the accuracy will be affected when the number of trials is small.The online surrogate endpoint evaluation system established in this study effectively lowers the threshold for surrogate endpoint evaluation,facilitates the generalization of theoretical methods and provides more application opportunities.Two case studies demonstrated that PFS is a qualified surrogate endpoint for OS in advanced ovarian cancer trials,and MRD and response in multiple myeloma trials are qualified surrogate endpoints for PFS. |