Font Size: a A A

Confidence Evaluation Of Peptide Sequences Based On The Sequence Model

Posted on:2021-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:X MinFull Text:PDF
GTID:2370330605467916Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the process of peptide sequence identification,it is a ve ry important step to scoring the candidate peptide sequence and the experimental tandem mass spectrometry(Peptide Spectrum Match,PSM).Accurate and effective PSM score algorithm can improve the accuracy of peptide sequence identification.Traditional PSM score algorithm usually use probability scores based on similarity scores of predicted theoretical simple spectra and experimental mass spectra to define the final score,which can not take full advantage of the regularity of peptide fragmentation.In order to solve the problem,a multi-classification probability confidence evaluation algorithm combined with the peptide sequence information representation was proposed: deep Score-?.The algorithm uses one-dimensional residual network to extract the underlying information of the sequence,and then integrates the effects of different peptide bond s on the current peptide bond fracture through the multi-attention mechanism to generate the final fragment ion relative strength distribution probability matrix.This algorithm extracts candidate peptide sequences from Comet and MSGF+ for re-scoring and comparison with the original results: the number of peptide retained by deep Score-? when FDR=0.01 in human proteome dataset increased by about 14% compared with Comet and MSGF+,and the Top1 hit ratio(the proportion of spectrum with the heighest score of correct peptide sequence)increased by about 5%.The generalization performance test of the model trained by human Proteome Tools2 dataset showed that the peptide retained by deep Score-? at FDR=0.01 improved by about 7% compared with Comet and MSGF+,the Top1 hit rate increased by about 5%,and the identification results from Decoy library in Top1 decreased by about 60%.Experimental results show that the algorithm can retained more peptide sequences when FDR=0.01 and improve the hit rate of Top1.Deep Score-? was impelemented as a confidence evaluation software tool which can be used to directly re-score the identification results of Comet and MSGF+.Identification results are recalculated at a given FDR threshold,and the tool also receives custom candidate peptides as input.In addition,the tool provides FDR analysis function to help determine the FDR threshold to improve the identification effect.
Keywords/Search Tags:peptide identification, confidence evaluation, attention mechanisms, residual network
PDF Full Text Request
Related items