Font Size: a A A

Identification Of Proteins By Mass Spectrum Data Analysis-Research Of Fragmentation Model, Phosphopeptides And Spectral Library Search Method

Posted on:2011-05-25Degree:MasterType:Thesis
Country:ChinaCandidate:J Z MiaoFull Text:PDF
GTID:2120360305968931Subject:Food Science
Abstract/Summary:PDF Full Text Request
Tandem mass spectrometry has been established as a powerful analytical tool for protein identification. The combination of biotic experiments and mass spectrometry can produce numerous spectral data, which is far beyond the manual identification. As a result, the analyses of mass spectra need to get help from computers. In general there are three distinct approaches that have been developed for identifying proteins:database search method, De Novo sequencing algorithm, and peptide sequence tag approach. With protein identification, it is critical to deduce the sequence of amino acids of unknown peptide from experimental spectrum. Thus, the accurate prediction of theoretical spectrum from the peptide sequences has become a key step. In order to predict theoretical spectrum, it is of great need to analyze the fragmentation process with quantitative knowledge, as well as the qualitative understanding of the mechanism of peptides' fragmentation, such as the position and the type of the peptide bond. With these factors, the precision of theoretical spectrum prediction and the accuracy of protein identification can be increased.Protein post-translational modification (PTM) plays an important role in organism. Phosphorylation is one of the most important PTMs. Studying the fragmentation patterns of phosphopeptides aid to the identification phosphopeptide and the validation of the sites. So it is worthwhile to study the fragmentation patterns of phosphorylated peptides.Given the complication of theoretical spectrum prediction, the novel strategy of protein identification based on the mass spectrum libraries has been applied to the area of protein identification. It can avoid the difficulty of theoretical spectrum prediction. However, the directly match of spectrum-spectrum leads to lower search, the more inaccurate matching, the larger required memory space, and so on. With these problems, the efficiency of protein identification would be affected.To address the problems mentioned above, the following attempts have been done:1 A new peptide fragmentation model has been proposed. To overcome the difficulty of b/y ratio, "fragmentation event model" has been proposed; that is, the possibility of fragmentations at different peptide bonds rather than intensity of ions have been predicted. In particular, the influences of both position and type of the peptide bonds have been quantified based on a training set of MS/MS spectra. Some experiments have been done on several MS/MS data sets. The examination of the quantified parameters from the iterative learning algorithm is in good agreement with some known qualitative knowledge about peptide fragmentation, as well as the the precision of theoretical spectrum prediction.2 Discovery the fragmentation patterns of phosphopeptide. The fragmentation model has been applied to the well annotated phosphopeptide training set. The phosphopeptide can be identified by the iterative learning algorithm. At the same time, the law of fragmentation can be learned. With the experiments, it is shown that phosphopeptide can be well identified by the iterative learning algorithm, and the law of fragmentation has differences between nonphosphorylated peptide and phosphopeptide.3 A novel strategy of protein identification based on mass spectrum library has been created. The collection of mass spectra have been classified and annotated. Then, all available spectra of a particular peptide sequence have been obtained for the creation of the consensus spectrum. Next, appropriate index has been built with the method of "thumbnail". Finally, the spectrum which is the most similar with the inquire spectrum has been found by the scoring function in the library. This method has advantages as the searching speed fasting, memory usage reducing and computational complexity decreasing. The average searching speed is approximately 1 million matches per second on one CPU.
Keywords/Search Tags:Tandem mass spectrometry, Data analysis, Protein identification, Fragmentation model, Phosphorylation identification, Mass spectrum library
PDF Full Text Request
Related items