Font Size: a A A

Study On Missing Values Imputation And Batch Effect Correction In The Data Preprocessing Of Mass Spectrometry-Based Metabolomics

Posted on:2022-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y S WangFull Text:PDF
GTID:2531306323971649Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Mass spectrometry(MS)technology is characterized by high sensitivity,high throughput and high coverage of molecular,which is widely applied on untargeted and targeted metabolomics for screening potential biomarkers and exploring molecular mechanism of diseases.However,due to heterogeneity of biological samples and instability of instrument,there are often many interfering factors in the data handling of MS-based metabolomics,such as peak shifting,missing values and undesirable variation etc.Therefore,data preprocessing is necessary to eliminate the impacts of these interference factors before the downstream statistical analysis.The present study focus on missing values imputation and batch effects correction in data preprocessing and the main contents are as follows:1.The issue of missing values in MS-based metabolomics is discussed at first,and a non-negative matrix factorization(NMF)based scheme for missing values imputation is proposed.This new method is supposed to be better in capturing of the inherent features in the metabolic profiles due to the capability of NMF in adaptive local representation.Then the NMF imputation is compared with three commonly used approaches from the perspectives of numerical accuracy of imputation,retrieve of data structures and ordering of imputation superiority,corresponding to normalized root mean squared error(NRMSE),reconstruction error of Gaussian graphical model(GGM),and mean score of ranking(MSR).Results show that NMF-based scheme is well-adaptive to various cases of missingness and presence of outliers in MS-based metabolic profiles and outperforms other commonly used methods.2.The second part of this paper proposes a new method for correction of batch effects,namely quality control-based linear regression(QC-LR).This method constructs the linear regression model by training the QC samples from different batches and then transfer the parameters of model to the sample data for correct the shifting from batch effects.Compared with other traditional methods,QC-LR is not only effectively eliminate the variations of abundance from batch effects,but also retain the original relationship between variables in metabolomic dataset.Therefore,QC-LR has advantages in ease of operation,low computational complexity,and excellent capacity,which could be recommended as a necessary step for data preprocessing in MS-based spectrometry.
Keywords/Search Tags:mass spectrometry, metabolomics, missing values, batch effects, non-negative matrix factorization
PDF Full Text Request
Related items