Font Size: a A A

Parallel Bio-computing Research And Implementation Of The Network Environment

Posted on:2005-08-07Degree:MasterType:Thesis
Country:ChinaCandidate:J C HuangFull Text:PDF
GTID:2208360125464194Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Biocomputing (or Bioinformatics) is an emerging interdisciplinary research area which applies computational methods to biological problems, especially in the field of molecular biology.The fundamental building blocks of life are proteins. Proteins are variable length linear, mixed polymers of 20 different amino acids. The genetic information we inherit from our parents is stored in DNA, and that information is used to make identical copies of that DNA and is also transferred from DNA to RNA to protein. DNA is a linear polymer of 4 nucleotides deoxyAdenosine monophosphate (abbreviated A), deoxyThymidine monophosphate (abbreviated T), deoxyGuanosine monophosphate (abbreviated G) and deoxyCytidine monophosphate (abbreviated C). RNA is a very similar polymer of Adenosine monophosphate, Guanosine monophosphate, Cytidine monophosphate, and Uridine monophosphate. A property of both DNA and RNA is that the linear polymers can pair one with another, such pairing being sequence specific. There has been a very successful international effort to collect all the sequences people have determined in one place so they can be searched. For DNA sequences, three groups have cooperated in this effort, one in Japan, one in Europe, and one in the United States to produce DDBJ, EMBL and GenBank, respectively. These databases are frequently reconciled with each other, so that searching any one is virtually the same as searching all three. The problem is that these databases are HUGE and, as a result, we must compare our sequence with this vast number of other sequences efficiently. A number of programs, such as BLAST and FASTA, have been written to rapidly search a database for a query sequence.The Smith Waterman algorithm for local sequence alignment is one of the most important techniques in computational molecular biology. This ingenious dynamic programming approach was designed to reveal the highly conserved fragments by discarding poorly conserved initial and terminal segments. However, the local alignment sometimes produces a mosaic of well conserved fragments artificially connected by poorly conserved or even unrelated fragments. This may lead to problems in comparison of long genomic sequences and comparative gene prediction. In this paper we propose a new strategy of dynamic accelerated penalty strategy to fix this problem. In the process of computing similarity matrix, if similarity value is larger than the pre-specified threshold X then start our strategy, when related character mismatches, then penalize more than others until similarity value is 0 or the process ends. The method does not change the standard Smith Waterman algorithm much and the running time and space does not increase, while test results show the superior performance of the technique and potential for the sequence local alignment.The approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. The Position-Specific Iterated BLAST (PSI-BLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities.The program of optimization BLAST with Parallel Methods is designed and realized. The parallel platform uses MPI. Main node distribute total task with load balance strategy and send them to every slave nodes. These slave nodes search database in their own search _range and send the partly results to the main node. Then the main node gather these information from slave nodes and do alignment again to the result sequence No. Finally the main node output the alignment result. The performance of the technique, parallel speedup and efficiency, are analyzed with Amdahl law. And the test results are given in this paper.
Keywords/Search Tags:Biocomputing, Sequence Alignment, BLAST, Parallel Computing, Optimizing with Parallel Methods
PDF Full Text Request
Related items