Parallelization Of Bioinformatics Sequence Assembly Program

Posted on:2003-12-02

Degree:Master

Type:Thesis

Country:China

Candidate:Q Yang

Full Text:PDF

GTID:2208360185995495

Subject:Computer applications

Abstract/Summary:

PDF Full Text Request

With the rapid improvement of biological sequencing technique,Human are more and more getting the information of the lives in the world,such as the DNA and protein。But researchers are not able to analyze the overwhelming data by manual work as usual。This situation has already delayed our knowing of the nature of life。It is very natural to use the computer to push our pace forward。The combination of computational technique and biology area brought into a new subject'_bioinformatics。In order to make computer understand the information of biology data,researchers use the sequencing technique to transform 1ife matter to data that can be read by computers。After sequencing,what computers face are some character strings(in biological words,they should be called"sequence"),which became a mainly research area of computer science long ago。Because of the limit of sequencing accuracy,we can't assure the correctness of sequences when they are beyond some length。So,what we get are some segments of a long sequence and they have to be assembled to donstruct the origin data。The most widely used program completing this mission is phrap。Phrap is a very excellent software to assembly the segments,but it still has its own problem such as memory demand and time cost。In the paper,we did original research to solve these existed problems。In Chapter 1.we introduced the theoretical base of the core algorithm of phrap。The data structures,whole algorithm and some main functions were analyzed in Chapter 2。The solutions to the problems of memory and time cost were introduced in Chapter 3 and Chapter 4 separately。The contribution of this paper is as follows:1.Transform phrap to dawning-3000 cluster by memory—shared method。This work make at least half memory demand distributed to the nodes of cluster。Some data that is not able to be computed bv serial program can be dealt with now。Also,if one task almost takes up total memory of one node when executed serially,we can run at feat two tasks of same scale at one time now。This will push the work forward as a whole and make full use of the computational resources。As nmch as I know,there is no the same kind of research published till today。2.Aiming at the most time-consuming part(sequence-assembling part),we dig out the parallelism ffom this structurally serial part。We realized the...

Keywords/Search Tags:

sequence, assembly, alignment, phrap, parallelizing

PDF Full Text Request

Related items

1	Research Of Improvement And Parallelization For Sequence Assembly And Multiple Sequence Alignment
2	Research On Product Assembly Sequence Planning Methods For Maintenanence
3	Research On The Theory, Method Of Assembly Sequence Generation, Evaluation And Optimization In Digital Product Pre-assembly
4	The Research And Implementation Of Biological Sequence Alignment
5	Biological Sequence Alignment Problem
6	Study Of Computer Aided Assembly Sequence Planning Based On Graph Theory
7	Design And Optimization Of Parallel Algorithm For Biolgogical Sequences
8	Research And Application On Digital Pre-assembly To The Laser Product
9	The Application Of ACO And Coding Method In Sequence Analysis
10	Research On Multiple Sequence Alignment Algorithms In Bioinformatics