Font Size: a A A

Research And Implementation Of PepNovo Parallelization Based On Tandem Mass Spectrometry

Posted on:2017-08-12Degree:MasterType:Thesis
Country:ChinaCandidate:L LiFull Text:PDF
GTID:2428330488479907Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the completion of the human genome sequencing,the research of life science goes into the era of post-genome,and the proteomics has become one of the hot spots for life science research.In proteomics,the use of tandem mass spectrometry for protein sequence analysis and identification is an important research part for seientist.However,the rapid development of mass spectrometry technology has bring vast amount data,that cause the process data time of current protein identification software greatly increased,which has become a bottleneck for the development of protein identification technology.In addition,although more and more organisms are sequenced,but there still exist a lot of species with unknown genetic code.Besides,due to the lack of a corresponding sequence in the database or the unknown post-translational modification,lead to the database search methods exist a lot of MS/MS spectral data which cannot be explained.In this case,de novo sequencing method plays an important role in protein sequence analysis and identification.Therefore,the research for the accelerattion of de novo sequencing protein identification software is very necessary.The main work of this paper is the parallelization of de novo sequencing algorithm PepNovo.Specifically,the content of this work and innovations are as follows:First,for the de novo sequencing algorithm PepNovo,we put forward a MapReduce-based parallelization method MR-PepNovo.Combined with MapReduce computing architecture,this paper realized the parallelization of de novo sequencing algorithm PepNovo by preprocessing and split the input data.The experimental results show that,compared with the original serial PepNovo algorithm,while the accuracy invariant,MR-PepNovo algorithm can achieve a speedup of 4.45.Secondly,based on the latest high-performance computing platform MIC,present the parallelization method of PepNovo.Since MIC architecture is compatible with the native CPU program,so this paper designed and implemented the parallelization of PepNovo based on MIC,through transplant the scoring function section of PepNovo to the MIC terminal.The experimental results of MIC platform show that with the invariant of accuracy,the maximum speedup of MIC platform is 28 times,almost 1.9 times compared with the maximum speedup of the CPU platform.This indicates that based on MIC platform the parallelization of PepNovo achieved a good acceleration rate.The research work for the realization of the de novo sequencing algorithm PepNovo's parallelization is a meaningful attempt,hoping this method can promote the further development of protein identification technology.
Keywords/Search Tags:Proteome, Tandem mass spectrometry, MapReduce, De Novo, PepNovo, MIC, Parallelization
PDF Full Text Request
Related items