Font Size: a A A

Design And Implementation Of A Parallel DNA Sequences Mapping System Based On MPI

Posted on:2015-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:H LiFull Text:PDF
GTID:2180330422492275Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the advent of sequencing technology, massive numbers of DNA short sequences are produced. Although these high-throughput sequences greatly promote the advance of life science, but also pose a new challenge to short sequences mapping tools. In recent years, mapping tools such as BWA, Bowtie and mrsFAST are developed, however, they still have difficulty in meeting the requirements in terms of accuracy, mapping time and memory cost. This paper makes a further study in DNA sequences mapping and proposes two strategies: MPI based parallel mapping method and sorting based short read mapping algorithm, which will effectively solve the mapping problem of over than100GB DNA short sequences.The MPI based parallel mapping method includes the following steps. Firstly, the master node transmits sequences to the corresponding node and creates hash index. Secondly, every node calls exact mapping algorithm and transmits the mapping results to the master node. Finally, every node redistributes the unmapped sequences and calls inexact mapping algorithm. This method can set arbitrary number of nodes and threads to map sequences, and also transmits asynchronous data when running mapping algorithm, which can reduce the time cost for parallel transmission.The sorting based DNA sequences mapping algorithm consists of sectional sorting algorithm, exact mapping algorithm, and inexact mapping algorithm. The exact mapping algorithm traverses these sorted sequences quickly and gets the mapping results. The inexact mapping algorithm processes most sections of sequence like exact mapping and only searches base error on the rest. By using the sectional sorting results, this method reduces the number of inexact mapping times.In order to test the practical effect of mapping algorithm and parallel method, this paper carries out several related experiments under Linux operating system and MPI parallel environment. The results show that the proposed mapping algorithm is more efficient than traditional ways when the number of error is limited. The proposed parallel method makes effective use of computing resources and greatly improves the mapping speed.This MPI based parallel mapping system not only handles large amount of data fast, but also requires low memory cost, which means a good performance and broad applicability.
Keywords/Search Tags:second-generation sequencing, short read mapping, hash index, parallelprogramming with MPI
PDF Full Text Request
Related items