Font Size: a A A

Parallelization Of Bioinformatics Sequence Assembly Program

Posted on:2003-12-02Degree:MasterType:Thesis
Country:ChinaCandidate:Q YangFull Text:PDF
GTID:2208360185995495Subject:Computer applications
Abstract/Summary:PDF Full Text Request
With the rapid improvement of biological sequencing technique,Human are more and more getting the information of the lives in the world,such as the DNA and protein。But researchers are not able to analyze the overwhelming data by manual work as usual。This situation has already delayed our knowing of the nature of life。It is very natural to use the computer to push our pace forward。The combination of computational technique and biology area brought into a new subject'_bioinformatics。In order to make computer understand the information of biology data,researchers use the sequencing technique to transform 1ife matter to data that can be read by computers。After sequencing,what computers face are some character strings(in biological words,they should be called"sequence"),which became a mainly research area of computer science long ago。Because of the limit of sequencing accuracy,we can't assure the correctness of sequences when they are beyond some length。So,what we get are some segments of a long sequence and they have to be assembled to donstruct the origin data。The most widely used program completing this mission is phrap。Phrap is a very excellent software to assembly the segments,but it still has its own problem such as memory demand and time cost。In the paper,we did original research to solve these existed problems。In Chapter 1.we introduced the theoretical base of the core algorithm of phrap。The data structures,whole algorithm and some main functions were analyzed in Chapter 2。The solutions to the problems of memory and time cost were introduced in Chapter 3 and Chapter 4 separately。The contribution of this paper is as follows:1.Transform phrap to dawning-3000 cluster by memory—shared method。This work make at least half memory demand distributed to the nodes of cluster。Some data that is not able to be computed bv serial program can be dealt with now。Also,if one task almost takes up total memory of one node when executed serially,we can run at feat two tasks of same scale at one time now。This will push the work forward as a whole and make full use of the computational resources。As nmch as I know,there is no the same kind of research published till today。2.Aiming at the most time-consuming part(sequence-assembling part),we dig out the parallelism ffom this structurally serial part。We realized the...
Keywords/Search Tags:sequence, assembly, alignment, phrap, parallelizing
PDF Full Text Request
Related items