Relatedness of biological sequences using alignment and restriction map databases

Posted on:1997-01-11

Degree:Ph.D

Type:Thesis

University:Michigan State University

Candidate:Kim, Jin

Full Text:PDF

GTID:2468390014980651

Subject:Computer Science

Abstract/Summary:

PDF Full Text Request

Comparative analysis of DNA and protein macromolecules has been an important component of biological research. Sequence alignment and restriction map comparison, in particular, have been useful in the study of molecular evolution, RNA folding, and protein structure-function relationships. In this thesis, we have investigated multiple alignment problems involving RNA and protein sequences, and the methods to infer relatedness between sequences using restriction patterns and restriction map databases.; We first consider the multiple sequence alignment problem. Multiple sequence alignment is used to discover an optimal alignment based on defined criteria. Variations of dynamic programming provide most commonly used tools for multiple sequence alignment methods. However, the biggest obstacle to using dynamic programming for multiple sequence alignment has been the high computational complexity. Due to this high computational complexity, dynamic programming cannot be effectively used to align more than three sequences and to apply certain types of cost functions. To overcome the limitations of dynamic programming, an algorithm called MSASA, based on simulated annealing, was developed. MSASA can overcome these limitations.; We then consider the multiple RNA sequence alignment problem to identify possible RNA secondary structures. An algorithm called RNASA, based on simulated annealing, is suggested for multiple RNA sequence alignment. In this algorithm, RNA sequences are aligned based on primary sequence similarity and then realigned based on secondary structure information. Dot matrices generated from intra-sequence comparisons are used to obtain possible common secondary structures. Several strategies to reduce the simulated annealing time are suggested.; Sequence database searches have been used for identifying unknown macro-molecules. The whole or a part of the primary sequence information is required to do the sequence database search. However, sequencing a macromolecule is expensive and time consuming. A more crude but faster method, without sequencing an unidentified macromolecule, has been developed. The method is based on restriction patterns of an unknown isolate and restriction map databases. We use a three-phase approach. Maximum Site Matching Problem (MSMP) is defined and is proved to be in the class of NP-complete problems. We demonstrate the usefulness of this approach by applying it to the rRNA sequences of the Ribosomal Database Project (RDP).

Keywords/Search Tags:

Sequence, Alignment, Restriction map, RNA, Database, Dynamic programming, Using

PDF Full Text Request

Related items

1	Biological Sequence Alignment Problem
2	Research On The Algorithms Of RNA Sequence Alignment
3	Implementation of a dynamic programming algorithm for DNA sequence alignment on the Cell Matrix(TM) architecture
4	Research On Pairwise Sequence Alignment Algorithms
5	Application Of Linear Space Algorithm For Pairwise Sequence Alignment Based On Parallel-Computing
6	Reaseach On Pairwise Sequence Alignment Needleman-Wunsch Algorithm
7	The Research And Implementation Of Biological Sequence Alignment
8	Study On Chinese Sequence Alignment Based On Word2VEC And CRF
9	Research Of Improvement And Parallelization For Sequence Assembly And Multiple Sequence Alignment
10	Automatically Get To Build The Study Of Biological Information Platform And Sequence Alignment Algorithm Based On Information