Font Size: a A A

Glycan De Novo Sequencing with Tandem Mass Spectrometry

Posted on:2011-11-21Degree:Ph.DType:Dissertation
University:The University of Western Ontario (Canada)Candidate:Shan, BaozhenFull Text:PDF
GTID:1444390002469437Subject:Computer Science
Abstract/Summary:
The structural variation in glycans is fundamental to their biological activity. One of the most powerful tools for glycan structure determination is tandem mass spectrometry. Interpreting the tandem mass spectra of glycopeptides with de novo approach is essential to determine novel glycan structures. In this work, we examine the glycan de novo sequencing problem.;We define glycan de novo sequencing as follows: Let M = {(mi, Ii) | 1 ≤ i ≤ n} be a spectrum of a glycopeptide, where mi is the mass and Ii is the intensity of a peak. For each mass value m, according to the intensity of the peak nearby m, a score function f(m) can be defined. Let T be a glycan tree. Then, the score of T, S( T), is defined as the summation of f(m) for all the mass values m of the fragment ions of T. The glycan structure de novo sequencing problem then finds a tree structure T such that the mass of T is equal to a given value M and S( T) is maximized.;We proved that the glycan de novo sequencing is an NP-hard problem for arbitrary score function, and then developed a heuristic algorithm for the problem. The algorithm first generates many acceptable small subtrees, which are then joined together in an iterative process to obtain larger suboptimal subtrees until the desired mass is reached. At each size of the subtree, only a limited number of subtrees are kept for later use.;Experiments on real MS/MS data of glycopeptides from the cationic isozyme peanut peroxidase showed that the heuristic algorithm can determine glycan structures accurately.;We use a labelled tree to represent a glycan structure. Let S be the alphabet of simple sugars. A glycan tree T is an unordered rooted tree with bounded degree whose nodes are labelled by letters from S. The degree of glycan trees is bounded by 4. The root of T is linked to a peptide.;Keywords: Glycomics, proteomics, tandem mass spectrometry, glycan, glycoprotein, de novo sequencing.
Keywords/Search Tags:Glycan, De novo sequencing, Tandem mass
Related items