Font Size: a A A

Research On Preprocessing Methods Of Tandem Mass Spectral Data

Posted on:2007-09-28Degree:MasterType:Thesis
Country:ChinaCandidate:C ZouFull Text:PDF
GTID:2178360185954140Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, mass spectrometry (MS) has been one of the most successful techniquesin proteomics research. Especially, the tandem mass spectrometry (MS/MS) has been widelyused for high-throughout protein identification. The preprocessing of MS/MS data plays animportant role in the procedure of peptide identification, since the large amount of noise andisotopic peaks may lead to heavy computation and low accuracy. This paper focuses onpreprocessing methods of the low-accuracy ion trap mass spectral data. Two problems areaddressed, including noise baseline identifying and isotopic peaks identifying, which arediscusses in Chapter 3 and Chapter 4, respectively.In Chapter 2, the mass spectrometry is introduced briefly, including the principle,components of mass spectrometer and processing methods of mass spectral data. Afterwards,the whole process of protein identification using mass spectra is explained, and the necessity ofdata preprocessing is analyzed. At last, the current algorithms and systems about MS/MS datapreprocessing are summarized.In Chapter 3, based on our observations, we propose a mixture model to characterize theintensity distribution in a spectrum. Specifically, a normal distribution is used to modellow-intensity noise peaks and a Gamma distribution used to for valid signal peaks. Theparameters in the mixture model are estimated using the expectation maximization algorithmand a fast estimated method. Results show that, our method outperforms the fixed baselineidentifying methods under the same reduction.There are some methods to identify the first isotopic peak by using the isotopic pattern onsome mass spectral data in high accuracy. Unfortunately, the ion trap mass spectral data aren'taccurate enough to use the method until now. In Chapter 4, the relationship between the firstisotopic peak and its mass and intensity is shown. Using this relationship, we construct severalfeatures, which characterize the real first isotopic peak. Then using the machine learningmethod, some rules are obtained to classify the first isotopic peaks from mono isotopic peaks.After deleting the first isotopic peaks from the data, we improve the peptide identificationaccuracy in several experiments.In the last part of this thesis, extensive experiment results by comparing the preprocessingmethod in pFind software are shown. The results show that our method not only reduces nearly15% of the time of identification, but also improves of the number of identified peptide underthe same false positive rate.
Keywords/Search Tags:Tandem mass spectra, Protein identification, Ion trap, Isotopic, Statistical model, Machine learning
PDF Full Text Request
Related items