Relatedness of biological sequences using alignment and restriction map databases | | Posted on:1997-01-11 | Degree:Ph.D | Type:Thesis | | University:Michigan State University | Candidate:Kim, Jin | Full Text:PDF | | GTID:2468390014980651 | Subject:Computer Science | | Abstract/Summary: | PDF Full Text Request | | Comparative analysis of DNA and protein macromolecules has been an important component of biological research. Sequence alignment and restriction map comparison, in particular, have been useful in the study of molecular evolution, RNA folding, and protein structure-function relationships. In this thesis, we have investigated multiple alignment problems involving RNA and protein sequences, and the methods to infer relatedness between sequences using restriction patterns and restriction map databases.; We first consider the multiple sequence alignment problem. Multiple sequence alignment is used to discover an optimal alignment based on defined criteria. Variations of dynamic programming provide most commonly used tools for multiple sequence alignment methods. However, the biggest obstacle to using dynamic programming for multiple sequence alignment has been the high computational complexity. Due to this high computational complexity, dynamic programming cannot be effectively used to align more than three sequences and to apply certain types of cost functions. To overcome the limitations of dynamic programming, an algorithm called MSASA, based on simulated annealing, was developed. MSASA can overcome these limitations.; We then consider the multiple RNA sequence alignment problem to identify possible RNA secondary structures. An algorithm called RNASA, based on simulated annealing, is suggested for multiple RNA sequence alignment. In this algorithm, RNA sequences are aligned based on primary sequence similarity and then realigned based on secondary structure information. Dot matrices generated from intra-sequence comparisons are used to obtain possible common secondary structures. Several strategies to reduce the simulated annealing time are suggested.; Sequence database searches have been used for identifying unknown macro-molecules. The whole or a part of the primary sequence information is required to do the sequence database search. However, sequencing a macromolecule is expensive and time consuming. A more crude but faster method, without sequencing an unidentified macromolecule, has been developed. The method is based on restriction patterns of an unknown isolate and restriction map databases. We use a three-phase approach. Maximum Site Matching Problem (MSMP) is defined and is proved to be in the class of NP-complete problems. We demonstrate the usefulness of this approach by applying it to the rRNA sequences of the Ribosomal Database Project (RDP). | | Keywords/Search Tags: | Sequence, Alignment, Restriction map, RNA, Database, Dynamic programming, Using | PDF Full Text Request | Related items |
| |
|