Font Size: a A A

Algorithm Research And Online Tool Development Of Data Analysis For Metabolomics

Posted on:2021-07-31Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q X YangFull Text:PDF
GTID:1480306107984759Subject:Chemical Engineering and Technology
Abstract/Summary:PDF Full Text Request
Metabolomics plays an important role in the biological system,aiming to systematically identify the low molecular weight metabolites in complex biological samples.At present,metabolomics can simultaneously monitor thousands of metabolites in the biological fluids(blood,urine,cerebrospinal fluid,cells and tissues).However,there are still a lot of great challenges in metabolomics research,including(1)in metabolomics studies,the numbers of samples are in the tens or hundreds due to the high experimental cost and limited experimental resources.The stability and classification ability would be affected by the small sample size and the lack of representativeness in the subsequent data analysis.(2)Although normalization has been widely used to remove the unwanted experimental/biological variations in metabolomics study,different methods can lead to different results even for the same data.So,how to choose the appropriate normalization method is a key challenge in metabolomics.(3)It has been reported that the biomarkers discovered from different studies are highly unstable in omics research.This inconsistency leads to the incredibility of reported biomarkers and greatly hinders the clinical application.(4)In untargeted metabolomics,less than 2% of the mass spectrum peaks could be identified as metabolites successfully.To solve these scientific problems,the research here focuses on four aspects:(1)a method is proposed to combine multiple experimental datasets into one data matrix for increasing the sample size and representativeness.Appliying this integration algorithm,more large-scale metabolomics studies are performed and more stable result could be achieved in the data analysis in biomedical research.(2)In order to correct signal drift and remove unwanted variations,NOREVA(https://idrblab.org/noreva/),an online tool,was developed to evaluate the performance of normalization methods based on multiple criteria.NOREVA provides the most comprehensive normalization methods(≥24),including internal standardization-based normalization and quality control sample correction method.According to the formula,the methods can be divided into sample-based normalization(based on the reference sample to reduce the difference between the samples),and metabolite-based normalization(based on the scaling factor to reduce the metabolite deviation).This study proposed and verified that the combination strategy(combining sample-based normalization and metabolite-based normalization)could result in superior performance in metabolomics.The most important thing is that these normalization methods were extended from case-control study to time-course and multi-class problems in metabolomics.NOREVA provided the comprehensive normalization(168 normalization methods)and systematic evaluation for time-course and multi-class studies to choose an appropriate method.(3)To tackle the instability of biomarkers,this study constructed a new stable feature selection method based on support vector machine-recursive feature elimination algorithm by integrating random sampling and consistency scoring.Compared with the traditional methods,the new stable feature selection method shows more consistent and better classification ability.The application of this new method in metabolomics will provide important clues for the discovery of diagnostic molecules and drug targets.(4)An web-server MMEASE(https://idrblab.org/mmease/)was developed for metabolomics data analysis.There is a metabolite database providing more than330,000 metabolites,including 107,071 endogenous metabolites,124,451 exogenous metabolites and 169,352 peptides.In MMEASE,there are a lot of detailed exogenous annotation information for these metabolites.In addiation,MMEASE provides enrichment analysis function for metabolites,including chemical families,cosmetic ingredients,food ingredients and food additives,plant metabolites and agrochemicals,small molecule drugs and drug metabolites,toxins,environmental pollutants and microbial metabolites.Above all,this study aims to solve these urgent problems in metabolomics.A series of researches were carried out in data integration,data normalization,biomarker identification and metabolite annotation.Innovative algorithms and online tools for metabolomics data analysis were constructed,which could provide powerful supports for integrating large-scale data,selecting the appropriate normalization method,discovering the stable and reliable biomarkers and interpreting biological significance.With the advent of the era of big data and precision medicine,the above work can lay a solid foundation for life science research.
Keywords/Search Tags:Metabolomics, Data Analysis, Biomarker discovery, Data Normalization, Web-based Tool
PDF Full Text Request
Related items