Font Size: a A A

A Protein Identification Algorithm For Tandem Mass Spectrometry With Deep Learning

Posted on:2021-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:R XuFull Text:PDF
GTID:2370330614458615Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Peptide identification based on tandem mass spectrometry is one of the key algorithms in proteomics.Protein sequence database search is a commonly used method for peptide identification.In tradition,searching a protein sequence database is usually required to construct the theoretical spectrum for each peptide at first,which only considers the information of mass-to-charge ratio.However,the information related to isotope peak intensity is neglected.Thanks to the rapid development of artificial intelligence technique in recent years,deep learning-based MS/MS spectrum prediction tools have showed a high accuracy and great potentials to improve the sensitivity and accuracy of protein sequence database searching.Based on the traditional search method of protein sequence database,this paper optimizes the search process by considering information related to isotope peak strength.The main research contents include the following two aspects:(1)Evaluation of common spectrum prediction tools based on machine learning and deep learning.Spectrum prediction using machine learning or deep learning models is an emerging method in computational proteomics.Several deep learning-based MS/MS spectrum prediction tools have been developed and showed their potentials not only for increasing the sensitivity and accuracy of data-dependent acquisition(DDA)search engines,but also for building spectral libraries for data-independent acquisition(DIA)analysis.Different tools with their unique algorithms and implementations may result in different performances.Hence,it is necessary to systemically evaluate these tools to find out their preferences and intrinsic differences.In this study,we used multiple datasets with different collision energies,enzymes,instruments,and species,to evaluate the performances of the deep learning-based MS/MS spectrum prediction tools as well as the machine learning-based tool MS2 PIP.We found that p Deep2 and Prosit outperformed the other tools in most cases and different tools have their own characteristics.For example,Prosit worked the best for predicting the spectra from Lumos;p Deep2 was the best tool to predict the spectra from Q Exactive.We also evaluated the predicting time of the deep learning tools on GPU and CPU.When GPU was used,the prediction was extremely fast;and when only CPU was used,the prediction speed was still fine for many tasks.(2)Deep learning is used to optimize the search method of protein sequence database.We selected the deep learning model(p Deep2)to predict the theoretical mass spectra of all candidate peptides,and applied the predicted fragment ion strength as one of the comparative parameters for peptide identification to the protein sequence database search tool(Deep Novo).In Deep Novo,we adjusted the intermediate process between the input and output of the model,and combined p Deep2 with Deep Novo to improve the sensitivity and accuracy of peptide recognition.The results showed that the recall and accuracy improved by about 2% at both the amino acid and peptide levels after considering the isotope peak intensity in peptide identification.By evaluating the spectrum prediction tools,we systematically summarized the performance of each spectrum prediction tool.We hope that the evaluations can provide helpful insights and guidelines of spectrum prediction tools for the corresponding researchers.The results of p Deep2 and Deep Novo combination showed that the accuracy of peptide identification and spectral resolution could be improved by considering the isotope peak strength.
Keywords/Search Tags:protein sequence database, tandem mass spectrometry, deep learning, peak strength of isotope
PDF Full Text Request
Related items