Font Size: a A A

Studies On The Peptide Identification Algorithms By Tandem Mass Spectrometry In Proteomics

Posted on:2014-03-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:N P DongFull Text:PDF
GTID:1220330431497889Subject:Analytical Chemistry
Abstract/Summary:PDF Full Text Request
Tandem mass spectrometry has become the method of choice in analyzing proteome, whereas the analysis of the large scale datasets generated from these experiments is still a challenge, especially for assignsing peptide sequences to tandem mass spectra (MS/MS) by protein sequence databassearch with high accuracy and speed. To solve this, a heuristic method is developed in current work based on mining a large number of high quality spectra. Consequently, currentwork can be divided into two main parts.The first part of the work, composed by chapter twoto four, focuses on mining large scale MS/MS sets. In chapter two, the fragmentation behaviors of proline effect in low-energy collision induced dissociation (CID) peptide fragment ion mass spectra are investigated, and fragmentation maps of prolines during the dissociation of peptide precursor ions aresubsequently provided. Generally, the N-terminal side of proline is the preferential cleavage site, which as a result, forms abundant fragment ion peaks. However, these selective cleavages can be significantly impacted by many other factors such as charge states of precursor ions, compositions of amino acids and positions of prolines in peptide sequences, etc. Further, the proline effect is competed with other selective fragmentation pathways such as aspartic acid effect, yN-2-b2pathway, etc. Thses observations and fragmentation maps of proline effect greatly extend the knowledge of the fragmentation rules of proline in low-energy CID mass spectrometry. Additionally, the probability of selective cleavages that occur at N-terminal side of prolines at each node in fragmentation maps is evaluated, which could provide the quantitative characterization of proline effect.Since scrambled ions are not considered during the identification of peptides, the influence of the fragments on identification results becomes the focus of interest in rescent years. Thus in chapter three, a comprehensive investigation of this type of ions is performed. Firstly, the extent of scrambled ions in low-energy CID MS/MS and possible fragmentation rules during the formation of the ions are investigated. The results show that scrambled ions are generally existed in tandem mass spectra, with the number fraction of10%but the intensity lower than20%of the base peak. However, no predominant fragmentation rules are found. Then, the experimental MS/MS spectra derived from different platforms are identified by five algorithms of three peptide identification strategies. To study the influence of scrambled ions on peptide identification results, same dataset with these ions removed is constructed and identified by same algorithms. After comparing these identification results, it can be found that the ions could at some extent impact the identification results. Further investigation shows that the occurrence of scrambled ions mainly impacts the extraction of y and b ions for spectrum-sequence match during the pre-processing of MS/MS spectra by identification algorithms, resultingin the variation of match scores. However, with efficient spectral preprocessing methods or robust scoring schemes, this influence can be ignored. These comprehensive investigations show how scrambled ions exist in peptide MS/MS and how they affect the identifications of peptides. Thus provide valuable information for prediction of peptide fragment ion mass spectra and its application in peptide identifications.In chapter four, a noval peptide fragment ion mass spectrum prediction algorithm pepMSPredictor, which is based on mining a large scale mass spectral dataset, is developed. pepMSPredictor firstly extracts fragment ion intensities according to the fragmentation pathways in Competition in Fragmentation Pathway model, and generates variable set for each fragment ion. Then each fragment ion intensity set generated by each fragmentation pathway is divided into many disjoint regions by classification tree. For each region or several regions, pepMSPredictor employs stochastic gradient boosting tree to correlate the fragment ion intensities with corresponding variables to construct regression models. Once the models are constructed, all predicted intensities derived from these models are combined together to form a predicted mass spectrum. This mass spectrum prediction algorithm is then tested by a standard protein mixtures dataset. The results show that pepMSPredictor could accurately predict peptide fragment ion mass spectra. Moreover, pepMSPredictor is expandable. That is, it could provide reasonable prediction result for different instrumental platforms.The second part of current work aims to develop efficient method to filter uninterpretable MS/MS spectra and avoid replicate database search prior to protein sequence database search, as these factors could significantly mistake the identification results in proteomic research and take much time for the qualitative and quantitative analysis of the data.Prior to the investigations, several peak extraction methodsare firstly studied as this procedure could dramatically affect the results of subsequent analysis. An optimal peak extraction method with optimal parameter is then obtained. Additianlly, a heuristic and simple deisotoping method is developed based on the simplified isotopic distributions of fragment ions. This method shows good abilities to delete isotopic fragments in experimental MS/MS spectra. Then, spectral quality assessment and charge determination algorithms are developed based on above peak processing methods. In order to make them applicable to different MS/MS spectra datasets derived from different instrumental platforms, current work extracts a large number of variables for each MS/MS spectrum and constructs distinct model for each dataset by linear discriminant analysis (LDA). After tested by large scale datasets, the spectral quality assessment models could effectively remove uninterpretable mass spectra (i.e. low quality MS/MS spectra). If the two spectral filtering models are combined, more than60%of incorrect identification results could be removed whereas90%of correct ones are retained. These results indicate the high performance of the algorithms developed in current work.
Keywords/Search Tags:proteomics, data mining, peptide identification, tandem massspectrum, prediction of tandem mass spectra
PDF Full Text Request
Related items