Font Size: a A A

Modeling Of Peptide Fragment Ion Intensities In Tandem Mass Spectrometry

Posted on:2019-10-20Degree:MasterType:Thesis
Country:ChinaCandidate:B P GaoFull Text:PDF
GTID:2370330545971216Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Protein sequence identification based on tandem mass spectrometry is the basic method of proteomics study.The aim of protein sequence identification based on Tandem mass spectrometry is to infer amino acid sequence of unknown peptide according to their experimental mass spectra.In this process,theoretical prediction of mass spectra for peptides plays an extremely important part.Accurate theoretical prediction of mass spectra not only benefits the development of protein sequence identification and reduces the waste of mass spectral data,but also makes us better understand the rules of peptide fragmentation in mass spectrometer and study the effect of different physiochemical properties on peptide fragmentation.Thus we are able to accurately simulate peptide fragmentation and improve the confidence of identified peptides.Therefore,it is necessary to devise a more accurate approach to prediction of theoretical peptide mass spectra.In this work,in order to acquire more accurate theoretical mass spectra,we attempted to train fragmentation models by using deep belief networks and gradient boosted decision trees based on various physicochemical(mass,basicity,hydrophobicity,helicity,etc.)and sequence features(amino acids composition in peptides and fragment ions,identity of amino acid at peptide's C/N-terminus,etc.)extracted from peptides.We performed the following procedure: 1)To find the feature set that makes the best predictions,deep belief networks and gradient boosted decision trees were adopted to build and test different prediction models on different feature sets,respectively.On a feature set with best performance,the Pearson correlation coefficient of deep belief networks and gradient boosted decision trees is 0.826 and 0.877,respectively.2)The best feature set and the importance of different features are analyzed by using gradient boosted decision tree.The investigation found four most important features which are all related to the C terminal of peptide: the ratio of the mass of C-terminal part right to the fragmentation site and the precursor mass,the basicity of the C-terminal part of fragmented peptide,the difference between mass of the C-terminal part to the fragmentation site and massto-charge ratio of peptide,the hydrophobicity of the C-terminal part of fragmented peptide.3)Experiments on two public data sets demonstrate that the accuracy of the two methods proposed by us is better than two representative methods,MassAnalyzer and OpenMS-Simulator.On the feature sets constructed in this work,gradient boosted decision trees performs better than deep belief networks.
Keywords/Search Tags:Tandem mass spectrometry, Proteomics, Prediction of theoretical mass spectrum, Deep belief network, Gradient boosted decision tree
PDF Full Text Request
Related items