Font Size: a A A

A Parallel And Optimized Algorithms For De Novo Short Read Assembly Using De Bruijn Graphs

Posted on:2014-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:F QiuFull Text:PDF
GTID:2250330425972301Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Genome sequencing has been the core content of genomics. With the emergence and development of the sequencing technology, huge sequencing data can be obtained in a relatively short period of time. Sequencing technology has moved toward that high flux, low cost and high precision. With the development of sequencing technology and emergency of massive data, handling the massive sequencing data accurately has become the bottleneck of the development of DNA sequencing.By the analysis of existing assembly methods and relevant technology based on de Bruijn graph, the advantages and disadvantages of the new-generation sequencing technology are presented. According to the short length, the huge amount and the high flux of read fragments, the de Bruijn algorithm is optimized by combining with a decision table. The sequence assembly time can be reduced and the accuracy rate of contig can be improved through optimizing the optimal path selection and the subsequent k-mer range of choices by updating information in the decision table. The performance of the algorithm is enhanced through the parallel design separately on the I/O to read and store the sequencing data and the splicing process.The experimental results demonstrate that the proposed algorithm has higher operation speed, lower memory consumption per computing node. The processing speedup can be raised by6times in the scenario of8processors splicing in the amount of data for20G C.elegans genome parallelly, which can be applied in handling large-scale graph within a constant amount of memory in a cluster.
Keywords/Search Tags:de Bruijn graph, sequence assembly, decision table, parallel
PDF Full Text Request
Related items