Font Size: a A A

The Method And Parallel Processing For Fragment Assembly In Whole Genome DNA Sequecing Project

Posted on:2003-07-10Degree:MasterType:Thesis
Country:ChinaCandidate:B F ZhangFull Text:PDF
GTID:2168360092998983Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Bioinformatics is the science of using information to understand the life activities which synthetically applies the advantages of biology, computer science and technology and mathematics etc. to exploring the biological meaning of the vast volumes of biological data. The whole genome DNA sequencing is basic to the study of bioinformatics and in the commonly used Shotgun method the reconstruction of the original sequence by the information of all the fragments is one of the important steps. Fragment assembly is an intricate and time-consuming process contains some practical difficulties, one of which is the interfering of repeats to the correct overlaps of the fragments.After the deep analysis of existing assembly methods and implementary softwares such as Phrap and EULER as well as the theoretical study of the ability of unique definite-sized substrings to label the relative positions of the fragments we propose an approach to screen the information of repeats based on the definite-sized characteristic substrings and also compute the optimum length of the characteristic substring hence we get the PL-condition to judge whether two fragments should have the pair-wise alignment. Before pair-wise fragment alignment, statistics by way of scanning the fragment date the occurrences and it's relation with each fragment of every definite-sized substring are demand to specify each fragment some characteristic substrings.Our assembly process falls into three steps such as fragment alignment, list join and contig merging. The process not only has the above screening approach as it's pre-process but also takes the characteristic substring information as the center of the construction of it's data structure hereby the PL-condition is used in the assembly step naturally and consequently reduces the times of aligning fragments. The implementary software PDL-Assembler uses the concise liner data structure to guarantee the processing efficiency and make the iteration simple. The testing results indicate that PDL-Assembler spending much less time than Phrap.Finally we study the policy for parallel processing of the problem. Analyzing division of the fragment data set and parallelizability of the sequential program helps us to raise a strategy which speeds up the time bottleneck of PDL-Assembler and gives us a way to scan the fragment data parallelly. We also discuss the hiding of communication delays with non-blocking communication and the optimization of communication overhead with tradeoff between the times of communication and the size of pack buffer. The results of two testing routines show that our parallel program ParPDL-Assembler has the preferable speedup and efficiency.
Keywords/Search Tags:Bioinformatics, Genome, DNA Sequencing, Fragment Assembly, Repeat, Parallel Processing
PDF Full Text Request
Related items