| Metabolomics generates a profile of small molecules that are derived from cellular metabolism and can directly reflect the outcome of complex networks of biochemical reactions,thus providing insights into multiple aspects of cellular physiology.With the development of system biology and bioinformatics,Metabolomics plays an important role in revealing the pathogenesis of complex diseases,identification of biomarkers and drug development by virtue of its unique advantages.Because of the rapid development of high-throughput Omics techniques and chemical analysis techniques,the massive amounts of biological omics data including metabolomics have accumulated.Because of the extended period of clinic data collection and huge size of analyzed samples,the long-term and large-scale pharmacometabonomics profiling is frequently encountered in the discovery of drug/target and the guidance of personalized medicine.So far,integration of the results(ReIn)from multiple experiments in a large-scale metabolomic profiling has become a widely used strategy for enhancing the reliability and robustness of analytical results,and the strategy of direct data merging(DiMe)among experiments is also proposed to increase statistical power,reduce experimental bias,enhance reproducibility and improve overall biological understanding.However,compared with the ReIn,the DiMe has not yet been widely adopted in current metabolomics studies,due to the difficulty in removing unwanted variations and the inexistence of prior knowledges on the performance of the available merging methods.It is therefore urgently needed to clarify whether DiMe can enhance the performance of metabolic profiling or not.Herein,the performance of DiMe on 4 pairs of benchmark datasets was comprehensively assessed by multiple criteria(classification capacity,robustness and false discovery rate).First,a systematic search in the MetaboLights database(Haug et al.,2013)was collectively conducted,and the MetaboLights was searched by the keyword “mass spectrometry”,which resulted in 339 projects(September 16,2018).Second,several criteria were used to ensure the availability and processability of raw metabolomics data.And then,three well-established criteria for the performance assessment of the data merging strategies based on LC-MS metabolomics applied were adopted in this study,which included identification precision,classification capacity and robustness.For the identification precision,the EF was used to measure the enhanced chances of true marker identification by a given analytical strategy over the random selection of true markers from all metabolites(Liu et al.,2014;Zhang et al.,2011).And the classification capacity was evaluated by the receiver operating characteristic(ROC)analysis together with the measurement of area under the curve(AUC)(Kohl et al.,2012).Particularly,overlap value was calculated(shown in Equation 6)based on many pair of marker lists for the evaluation of robustness.The closer the overlap value equal to 1,the more robust the markers discovered in that study(Wang et al.,2014).As a result,integration/merging-based strategies(ReIn and DiMe)were found to perform better under all criteria than those strategies based on single experiment.Moreover,DiMe was discovered to outperform ReIn in classification capacity and robustness,while the ReIn showed superior capacity in controlling false discovery rate.In conclusion,this study started with data merging methods in the current metabolomics analysis and conducted a comprehensive comparasion of 3 data merging methods widely applied in LC/MS-based metabolomics analysis through the three criteria,including classification capacity,robustness and false discovery rate,and provided valuable guidance for researchers to the selection of suitable analytical strategy for current metabolomics. |