| Metabolomics is the newest Omic techniques that developed following Genomics,Transcriptome and Proteomics,which is one of the most important branches of system biology.With the development of system biology and bioinformatics,Metabolomics plays an important role in revealing the pathogenesis of complex diseases,identification of biomarkers and drug development by virtue of its unique advantages.Because of the rapid development of high-throughput Omics techniques and chemical analysis techniques,the massive amounts of biological omics data have accumulated,metabonomics is not exceptional.How to mine valuable information from the omics data has become crucial.Recently,a large number of bioinformatics tools have been developed for data analysis of metabolomics.However,the analyis of metabolomics data is still relatively difficult due to its high dimensions,high noise and sparse characteristics,especially the LC-MS based metabolomics.In the analysis process of metabolomics,the data preprocessing step plays a significant effect on the subsequent analysis,and data normalization is the key of the data preprocessing step.It is difficult to choose an optimal normalization method for a given data set from a variety of data normalization methods.The research emphases of this paper include three parts:Firstly,we conducted a system investigation on MetaboLights database,which is a dedicated repository of raw metabolomics dataset.The normalization method of different platform and sample size was studies through literature researches,and found that some normalization methods such as log transformation,Auto scaling,Pareto scaling,the total sum normalization and PQN normalization was widely used in different plateform and different sample size.And also found the cases that combine several normalization methods to insure the downstream analysis stable,such as intergrate Auto normalization with the total sum normalization and Mean normalization seprately.Secondly,we conducted a comprehensively comparision of 20 classical normalization methods used in metabolomics based on the method’s capability of reducing intragroup variation among biological samples.According to the cluster analysis of the measures of intragroup variance in 28 independent datasets with various sample size,MSTUS and Log transformation methods were identified as the methods that can reduce intragroup variance across various sample size efficiently,VSN,Level,Power and Range normalization can be adopted when the sample size is moderate,while Loess and Contrast normalization are suit for small sample size according to our datasets.Because Li-Wong,VAST and Sum normalization can not decrease the intragroup variance across almost the datasets according to our study,we suggest that maybe choose those normalization methods to data anlysis should be careful.Finally,we carried out a comprehensive assessment of 20 normalization methods based on four untargeted metabolomics datasets on disease.Two well-estabilished criteria were used to compare the normalization methods,including the capability of reducing intragroup variation and influence on classification accuracy.As a result,EigenMS、MSTUS、VSN、Cubic、PQN、Median、Log transformation normalization are the better normalization methods across different sample size according to the two criteria in the four datasets,while Sum and Contrast normalization methods underperformed across four datasets.In summary,this study started with normalization methods in the preprocessing step of metabolomics analyis and conducted a comprehensive comparasion of 20 normalization methods widely applied in metabolomics analysis through the two criteria,including the capability of reducing intragroup variation and influence on classification accuracy.And provide reference and guidance for researchers to choose the most suitable standardized method when conducting metabolomics analysis. |