Font Size: a A A

Design And Implementation Of Direct Protein Identification System Based On Mass Spectrometry Data

Posted on:2014-10-27Degree:MasterType:Thesis
Country:ChinaCandidate:C YangFull Text:PDF
GTID:2250330425483938Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the emergence of "human genome project", proteomics has been rapidlydeveloped and widely used in related fields. The protein identification, of which themass spectrometry data analysis is the core, has become one of the key areas ofproteomics research. At present, the protein identification relies on the chargeddatabases like Mascot or other free databases. Considering the complexity of proteinanalysis, charged databases like Mascot have better performance but could not belarge-scaled promoted. More important, the working principle of the massspectrometer may ignore the spectral peaks with low intensity but very valuable peaks.In view of the problems above, this dissertation does not rely on charged databases forthe research of the direct protein identification system, and is expected to embedalgorithm into the mass spectrometer in the real-time processing, in order to obtainmore accurately data.Firstly, this dissertation discusses the background and significance of proteomicsresearch, and introduces some basic analysis process of Mass spectra and the structureof the protein mass spectrometry, as well as the state of art of the protein field atdomestic and abroad. Secondly, based on the brief introduction of the massspectrometry for protein identification process, detailed descript ed several typicalprotein identification algorithms, including de novo sequencing, sequence searchmethod,search method using Tag and mass spectrometry database. Then, put forwardthe basic thought of the direct protein identification algorithm, and focuses on theseveral key steps including direct identification process, the mass spectrometry datapreprocessing, the first mass spectrometry data analysis and the second massspectrometry data analysis. At the same time, combining the method of“High-Resolution Analysis of Spectra” and “Rapid validation via stable isotopelabeling”, this dissertation proposes the direct protein identification process. Finally,in view of biological data analysis platform Galaxy, proposes the design scheme andimplementation result of direct protein identification system. The system consists ofthree key modules including the mass spectrometry analysis, the first massspectrometry analysis and the second mass spectrometry analysis. The dissertationuses C++, python, Perl as development languages that are fit for mass spectrometrydata, according to the design scheme, and on the basis of the integration of opensource tools, realizes the entire system, which can be accessed by visitingsam.galaxcloud.com. The final test shows that, the typical mass file (Raw file) can be used as the system input, through the pretreatment, format conversion, the first andsecond MS analysis, without the condition of relying on protein database, proteinidentification can be realized. At the same time, the algorithm, after being furthersimplified, can be embedded into a mass spectrometer, so as to further improve theefficiency of the mass spectrometer.
Keywords/Search Tags:Protein, Mass Spectrometry, Spectral Peaks, Raw Files, Galaxy Platform, Mascot Retrieval Software
PDF Full Text Request
Related items