Font Size: a A A

Research Of Quality Control Tools For Protein Identification Based On Mass Spectrometry Data

Posted on:2019-09-30Degree:MasterType:Thesis
Country:ChinaCandidate:N Q QiuFull Text:PDF
GTID:2370330566460376Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
In recent years,the continuous development of proteomics has brought revolutionary progress in experimental methods and instrument technology,and also led to the rapid growth of data scale.The breadth and depth of proteomics have already belonged to the category of big data,which put forward higher requirements for matching data processing capabilities.Unlike the genomics,data from proteomics are mainly derived from the spectrum produced by the mass spectrometer.The characteristics of the commonly used shotgun mass spectrometry technology make the spectrum very unstable when it is mapped to the peptide or protein.However,the quality control tools used to verify,correct mass spectrometry data and follow up analysis are not so satisfying.With the rapid expansion of the data of proteomics,the data confidence problem is becoming increasingly prominent,because the reliability of subsequent knowledge discovery depends on the accuracy of identification results.This study explored quality control methods of each process in proteomics analysis,and figured out a pipeline for mass spectrometry data processing and quality controlling cooperated with proteomics laboratory.Based on the Galaxy framework,we combined mainstream analysis tools and quality control methods such as Mascot and Percolator,finally constructed an automated and high throughput workflow.This workflow improved the efficiency of data analysis including quality control,which makes the speed of data processing can catch up with MS data producing,and provides a great convenience for the daily analysis of laboratories.Meanwhile,in order to make full use of our Galaxy workflow,we developed a program called IDM combining the ideas from two well-known protein quality control software(IDPicker,MAYU),and embedded it into our workflow.In later analysis,we selected a set of high quality gastric cancer data and pre-processed them with our workflow,then IDM and three other published quality control tools were used to do quality control at protein level.By analyzing the differences between the results of protein identification and the final biological conclusion,we found that IDM was a more reasonable method for protein level quality control.In general,the workflow and quality control methods designed in this study can provide a reference for the quality control of mass spectrometry data at protein level,and to some extent promote the development of proteomics.
Keywords/Search Tags:Proteomics, Bioinformatics, Quality control, Automated workflow
PDF Full Text Request
Related items