Font Size: a A A

Research On Parallel Technology For Key Algorithms Of DNA Sequence Analysis Based On CPU-MIC Cooperations

Posted on:2015-11-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y B CuiFull Text:PDF
GTID:2310330509460617Subject:Software engineering
Abstract/Summary:PDF Full Text Request
DNA sequence analysis is to explore the rule of life in the scope of molecular. It's the basis of researching and reforming the characters of organism. With the rapid development of sequencing technology, the scale of sequencing data explodes, which is far beyond the Moore's Law. State-of-the-art servers and software cannot deal with so much data efficiently in time. Based on this issue, we chose three important and fundamental algorithms, including short sequence about(SOAPsnp, BWA) and long sequence about(MC3), combine the CPU-MIC micro-heterogeneous architecture of Tianhe-2, and design three algorithms where CPU cooperated with MIC. Our approaches improve the time and space efficiency efficiently, with the precision maintaining. Our work includes:1. SOAPsnp is a popular SNP detection tool based on Bayesian Model. It's widely used for its high precision. The core algorithms of SOAPsnp include a lot of redundant computation and the key data structures are complicated, which leads to the low efficiency of the program. To solve the above problems, we redesign the algorithms and data structures of SOAPsnp by four-dimension matrix reduction, fast access table, and uniform fast sorting. With our approach, the complexity of algorithm decreases from O(n5) to O(n). Moreover, we design and implement a Tianhe-2 oriented large scale parallel algorithm by CPU cooperating with MIC. The optimized algorithm obtains up to 50 X speedup and the parallel efficiency is up to 73.6%. For multiple nodes(512 nodes, 12,288 CPU cores and 86,016 MIC cores), our approach gets 483.6 folds parallel speedup and linear scalability. The optimized software can accomplish the SNP calling of one whole human genome within one hour, whereas SOAPsnp needs up to one week.2. BWA is a popular DNA sequence aligner based on FM index. To improve the alignment speed and efficiency of BWA, we design a three-stage pipeline based parallel alignment algorithm by CPU cooperating with MIC. Our algorithm hides the delay of data I/O. We test the optimized algorithm in 2,048 nodes of Tianhe-2. The parallel efficiency is up to 63.4%.3. Mr Bayes is used for phylogenetics tree construction. Its core algorithm MC3 introduces a hot chain compared with MC2, which improves the accepted probability of Markov chains. The algorithm of Mr Bayes MC3 is complicated, and the number of iterations is large. To improve the efficiency of Mr Bayes, we propose a parallel construction strategy of phylogenetics tree by isomorphic division, where the construction of one tree is divided into several sub-tree constructions. Our approach decreases the granularity of algorithm and increase the parallel scale. Moreover, we design and implement a Tianhe-2 oriented parallel constructing algorithm by CPU cooperating with MIC. The parallel scale of optimized algorithm is up to 32 nodes on Tianhe-2, which improve the parallel scale by an order of magnitude, with the parallel efficiency up to 52.2%.
Keywords/Search Tags:DNA Sequence Analysis, CPU-MIC Cooperated Parallel, Tianhe-2, SNP, Sequence Alignment, Mr Bayes
PDF Full Text Request
Related items