Font Size: a A A

An expert system incorporating data simulation, feature recognition, model fitting, and data analysis functions

Posted on:2010-05-03Degree:Ph.DType:Thesis
University:University of Colorado at DenverCandidate:Sun, ShaojunFull Text:PDF
GTID:2448390002975488Subject:Biology
Abstract/Summary:
A major limitation in protein identification from complex mixtures is the ability of search programs to accurately assign peptide sequences using mass spectrometric fragmentation spectra (MS/MS). Manual analysis is used to assess borderline identifications; however, it is error-prone and time consuming, and criteria for acceptance or rejection are not well defined. The primary computational strategy for MS/MS identification requires the prediction of spectra from peptide sequences. However even state of the art algorithms such as MASCOT do not evaluate intensity information in experimental spectra, considering only the theoretical fragment masses of a candidate peptide sequence. In this dissertation thesis, I report a Manual Analysis Emulator (MAE) expert system which implements criteria used in manual analysis of low energy collision-activated dissociation (CAD) spectra.;MAE evaluates the chemical plausibility of peptide assignments by measuring the similarity between experimental spectra and predicted spectra. The predicted spectra are simulated using a kinetic model of peptide fragmentation for each candidate sequence assignments. The kinetic model used in the initial version of MAE was developed by Z. Zhang and is based on known gas phase mechanisms of peptide dissociation (Anal. Chem. 2004, 76:3908-3922), as implemented in the software program MassAnalyzer. To add new chemical mechanisms and use a more powerful optimization method for parameter fitting, I created my own implementation of the kinetic model, S3. Parameters are fit using a constrained Levenberg-Marquardt (LM) algorithm with a novel merit function (dSIM). Classic chi-square LM fitting followed by dSIM score fitting significantly improves the similarity between the experimental and predicted spectra. Receiver Operator Characteristic (ROC) plots for peptide identification using S3 in conjunction with MAE demonstrates that the constrained LM algorithm improves discrimination. Additionally the fast running time and effective convergence to local minima in S3 will aid in the testing of new chemical mechanisms. Machine learning methods are used to determine the factors (features) that contribute most significantly to inaccurate prediction in the current kinetic model (Zhang, 2004, 2005), thus revealing where new chemical mechanisms are required. We demonstrate improvement in prediction accuracy by incorporating a novel N2 C-terminal proline cleavage mechanism into the kinetic model.
Keywords/Search Tags:Model, Peptide, Fitting, New chemical mechanisms, Spectra, MAE
Related items