Font Size: a A A

Research And Implementation Of High-Throughput Sequencing Alignment Method Based On Distributed Computing

Posted on:2016-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:2180330467995085Subject:Computer technology
Abstract/Summary:PDF Full Text Request
High-throughput sequencing technology, which its sequencing speed is very fast, but the length of the sequence is short. This is a big challenge of analysis DNA sequences. Therefore, after analyzed the application requirements in the current sequence alignments, the current research status and shortcomings of sequence alignment algorithm, this paper find a high-throughput sequencing alignment method based on distributed computing and implement it:(1)Proposed a method to implement a serial sequence alignment algorithm under a distributed system. For the characteristics of sequence alignment algorithms, this paper proposed a new method of distributed sequence alignment system. The method is based on the Master/Slave model, divide the process into four parts:data preprocessing, sequence distribution, sequence alignment, process the results.Also implement the bowtie to a distributed implementation based on this method. And through experiments that the method can greatly limit the performance of each node play, improve efficiency ratio.(2)A dynamic load balancing algorithm of the distributed sequence alignment system based on MPI. Aimed at the characteristics of D-Mapping model and the short comings of MPI which is that it had no support of load balancing, after research and analysis in distributed cluster load balancing algorithm and the relevant factors, we find this algorithm. The algorithm uses the Master node collects load balancing information transmitted by the Slave node over the current system, the node scheduling process without scheduling the entire process, only need to schedule the transmission position of a DNA sequence in the file. Finally, we use the real sequence of the human genome DNA to verify the feasibility and effectiveness of the algorithm.(3) A fault-tolerant method of the distributed sequence alignment system based on MPI. In the MPI standard, it did not supply an effective fault-tolerant function, just quit all processes if there’s an error in one node, which greatly limits the D-Mapping application in large-scale distributed cluster. In exploring the MPI-related issues on the basis of fault tolerance, we propose fault-tolerance method which combines the user controlling checkpoint and MPI Inter Communicator. The method, first make Inter Communicator with each Slave node and the Master node, to ensure the system will not all exit because of one node failure, and then in the calculation process by storing user-controlled checkpoint to achieve the task scheduling and recovery after a node error. Finally, we use the real sequence of the human genome DNA to verify the effectiveness of the method.
Keywords/Search Tags:distributed computing, sequence alignment, load balancingfault-tolerant
PDF Full Text Request
Related items