Font Size: a A A

Algorithm Researches And Applications In Quantitative Proteomics

Posted on:2016-12-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:C ChangFull Text:PDF
GTID:1220330461491116Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
It has been twenty years since Marc Wilkins firstly came up with the word “proteome” in 1994. As one of the popular research fields in the post-genomic era, proteomics is developing at a high speed by constantly improving technologies of mass spectrometry(MS) and experiment. MS has been one of the dominating technologies in proteomics thanks to its high throughput and high resolution. Therefore, MS data analysis has come to be one of the major research contents for proteomic bioinformatics.The quality of MS data used to be relatively poor because of low precision and resolution, as well as much noise in MS spectra. Researchers had to solve identification issues, including how to assign the correct peptide sequence to each MS spectrum and how to filter out false positives in the identification results(also known as the quality control problem). Encouragingly, with the joint efforts of researchers in varied fields, the precision and resolution of MS are making huge progress and the corresponding MS data identification methods and tools are improving all the time. Therefore, those problems are solved with proper solutions. Nowadays, the research focus is shifting from identification to quantification for proteomics. Quantitative proteomics has been one of the important areas in –omics researches. It not only focuses on the abundance changes for the same proteins in different samples, but also includes researches on the abundances of different proteins in one sample. Quantitative proteomics is vital to protein-protein interactions and biomarker discovery.Quantitative proteomics can be divided into two categories: relative quantification and absolute quantification for different research purposes. Strategies for relative quantification can be further classified as stable-isotope labeling and label-free according to different experiment methods. The corresponding experiment strategies and methods for quantitative proteomics have been increasing. However, their quantitative analysis methods and software tools are not improving at the same speed. On the one hand, the rapidly increasing amount of MS data demands a higher level of accuracy, precision and computation efficiency of the quantitative algorithms. On the other hand, the development of quantitative algorithm combined with improved experiment design and technology will enable further exploration about the high quality MS data.In order to explore and address the issues mentioned above, we focus on quantitative algorithm researches as well as the development and application of the quantitative software tools which will methodologically contribute to the future development of proteomics. The researches in our study are detailed as follow.(1) Relative quantitative algorithm researches with quality-control methods. At the spectrum level, we put forward a novel algorithm named “dynamic isotope matching tolerance algorithm” for matching experimental and theoretical isotope clusters, which improves quantitative sensitivity. At the peptide level, we first of all proposed the concept of quantitative confidence and defined three quantitative confidence filters and two confidence scores. The confidence filters can eliminate false positives in quantification results for quantitative qualitycontrol and the scores provide measurements of the quantification result confidence. At the protein level, we implemented three algorithms to remove peptide outliers. Finally, based on the algorithms mentioned above, we developed an efficient tool for stable isotope labeling MS data quantitative analysis with quality control methods, named SILVER, which was validated on one large-scale complex dataset and two standard datasets with different labeling ratios and proved to be accurate and sensitive.(2) Protein absolute quantification algorithms based on peptide quantitative efficiency. Firstly, we put forward the concept of “peptide quantitative efficiency” for the first time and defined it as the efficiency with which a peptide is identified and quantified in MS, representing the relation between the peptide intensity derived from MS and the peptide real abundance. Secondly, we collected 587 features of peptide properties. Based on a semi-supervised model, we determined the sample-specific peptide quantitative efficiency, named Qscore. Lastly, we applied Qscore to protein absolute quantification. Then, Qscore was compared with two other popular absolute quantification methods i BAQ and APEX on three different datasets with increasing sample complexity. Results showed that Qscore could significantly decrease the quantification biases for the peptides of one protein and decrease the repetitive quantification error, suggesting that Qscore is more accurate and reproducible, especially in highly complex samples.(3) Development of a comprehensive and parallel quantification software package named PANDA. On the basis of the protein quantification algorithm researches, we designed and developed a comprehensive quantitative algorithm library at the spectrum, peptide and protein level, which allowed PANDA to handle stable-isotope labeling, label-free and absolute quantitative MS data. Meanwhile, we designed an efficient multi-level and multi-thread parallel framework which allowed the parallel computation among or in fractions. Therefore, the corresponding quantitative algorithms were also modified and refined for the parallel framework. Besides, PANDA also provided data visualization and statistical analysis functions. The data visualization module included a table view and graphic view for quantification results. The statistical analysis included the whole workflow of differentially expressed protein selection, ranging from missing value imputation to statistical tests.(4) Construction and application of large-scale quantitative data analysis workflow. Recently, a huge amount of heterogeneous MS data has been generated from different laboratories with different experiment strategies in the Human Chromosome-Centric Human Proteome Project(C-HPP). To analyze and integrate the multi-lab quantitative proteomics data, we developed an automated quantification and normalization workflow for large-scale MS data which eliminated the biases arising from different MS platforms and different experiment operations.Above all, our study is closely combined with the latest experiment and technology development in quantitative proteomics, focusing on the deep exploration of MS data in a quantitative view. In our study, we concentrated our efforts on the analysis of quantitative algorithms and the development of comprehensive quantitative software tools. All these research achievements have been applied to several large-scale MS data quantitative analyses such as C-HPP and laid a solid foundation for future studies in quantitative proteomics.
Keywords/Search Tags:Proteomics, Bioinformatics, Mass spectrometry, Quantitative algorithm
PDF Full Text Request
Related items