Font Size: a A A

Research On Mass Spectrum Feature Recognition And Chromatographic Retention Time Prediction

Posted on:2023-06-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:R JuFull Text:PDF
GTID:1521307031478154Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Metabolomics analyzes the changes of endogenous metabolites types and contents to learn the biological events that have occurred in organisms.Metabolomics is widely used in many fields,such as nutrition science,precision medicine and translational medicine,etc.Ultra high-performance liquid chromatography coupled with high-resolution mass spectrometry(UPLC-HRMS)is the main analytical tool of metabolomics,which can obtain rich mass spectrum signals of metabolites from biological samples.In order to extract metabolite information,firstly,it is necessary to recognize mass spectrum features,remove false positives,reduce false negatives,avoid the loss of metabolites with low abundance,and improve the recognition coverage of metabolites.Secondly,the mass spectrum features should be identified as metabolites in order to screen biomarkers and study the biological function of metabolites;Metabolite identification only based on mass spectrum information leads to high false positive rates,and the chromatographic retention time could improve the accuracy of identification,but most metabolites in the database lack retention time.Focusing on the problems mentioned above,this dissertation studies methods of mass spectrum feature recognition and chromatographic retention time prediction.The main contents are as follows:(1)A new method for recognizing the false positive mass spectrum features based on information entropy and statistical correlation was proposed.Firstly,a new information entropy index was defined to evaluate the quality of extracted ion chromatograms,and an algorithm based on entropy index and statistical correlation for recognizing the chromatographic peak profiles was developed,then to recognize the false positive mass spectrum features of noise and non-sample source compounds.For the UPLC-HRMS data of the metabolite standards mixture sample,more than 92% of the false positive mass spectrum features were removed by this proposed method and all metabolite standards were retained;For the UPLC-HRMS data of the urine sample,the number of mass spectrum features was significantly reduced from 7182 to2522 by this proposed method,but 98% of identified metabolites were retained.Besides,the performances of the proposed method on recognizing false positive mass spectrum features in the metabolite standards mixture and urine were better than the performances of RSD and MS-FLO.These results show that this proposed method can effectively remove the false positive mass spectrum features and retain the mass spectrum information of metabolites.(2)A new method to improve the recognition coverage of metabolites based on the similarity graph of mass spectrum features was proposed.The mass spectrum features of metabolites recognized by different peak matching methods were integrated by the proposed method to obtain rich metabolite information,then the similarity graph of mass spectrum features was established;By searching the maximal complete subgraph,the highly clustered mass spectrum features in the retention time ~ mass charge ratio space were recognized to effectively remove the redundant information after integration.In the experiment,for the UPLC-HRMS data of the mixture sample containing 41 metabolite standards at low concentration,19,19 and 27 metabolites were recognized by XCMS,MZmine 2 and SIEVE respectively,while 37 metabolites were recognized by the proposed method in this dissertation.For the UPLC-HRMS data of the diluted urine sample,1360,2455 and 643 mass spectrum features of metabolites were recognized by three peak matching methods,while 2960 mass spectrum features of metabolites were recognized by this proposed method,including 991 mass spectrum features with low abundance.Total of 1619 metabolites were obtained through ion fusion.The results show that this method can significantly improve the recognition coverage of metabolites and identify more metabolites with low abundance,which has good practical application value.(3)A method for predicting the retention time of compounds based on weighted pre-training and transfer learning was proposed.In the process of pre-training,mutual information was used to evaluate the relationship between features in molecular descriptors and retention time,the roles of important features in molecular descriptors in loss function were strengthened,the performance of pre-training was improved,and a deep neural network prediction model with good performance was established on the data set containing a large number of compound retention time;Then,through transfer learning,the model was fine-tuned based on the small sample-size data containing known retention time of compounds in the target chromatographic system,and it was utilized to predict retention time of any compounds in the target chromatographic system.In the experiment,a deep neural network was established based on the SMRT data of 80038 compounds and then transferred to 14 different chromatographic systems.The results show that,in most cases,the prediction performances of this proposed method were better than the performance of transfer learning based on DNN,VW-SAE,AE-DNN and GNN-RT and the performance of machine learning algorithms RF,GB and LASSO.The problem of retention time prediction under different chromatographic conditions with only a small amount of compound retention time can be solved effectively.The annotation results of 133 metabolite standards show that this proposed method has good practical value in the identification of mass spectrum features.
Keywords/Search Tags:Metabolomics, Recognition of false positive mass spectrum features, Recognition of mass spectrum features with low abundance, Retention time prediction
PDF Full Text Request
Related items