Algorithms for biological sequence problems

Posted on:2002-09-22

Degree:Ph.D

Type:Dissertation

University:Carnegie Mellon University

Candidate:Halldorsson, Bjarni Vilhjalmur

Full Text:PDF

GTID:1468390014450790

Subject:Mathematics

Abstract/Summary:

We explore computational problems arising in biological sequence analysis. In particular we explore problems related to DNA sequencing, protein identification and phylogenetic analysis. We show how biological problems can be formulated into a framework amenable to computation and we give efficient algorithms for these problems.; We study a proposed method for sequencing DNA, interactive sequencing by hybridization (ISBH). In this approach a series of arrays (SBH chips) are built and on each SBH chip a large number of oligonucleotides are arranged. A combinatorial method is used to construct the sequence from the collection of probes that occur in it. We give an optimal algorithm to decide which oligonucleotides to lay on the chips.; We study computationally the feasibility of identifying proteins using antibodies that are specific to short sequences of amino acids. A set of antibodies are built and laid on a chip. The proteins are then identified from their binding to the antibodies, the binding of the proteins to the antibodies being dependent on whether the amino acid substring occurs in the protein. For this problem we give two separate results. First we show, assuming that no errors occur in the antibody binding process, that the number of antibodies that need to be built is manageable. Secondly, we give a method for identifying proteins in the presence of a large number of false negative errors and a modest number of false positive errors.; Assuming that no errors occur in the antibody binding, we model the protein identification problem as a minimum test collection problem. For this problem, we: (1) show that it cannot be approximated efficiently within o(log n) factor where n is the number of proteins unless P = NP; (2) give new algorithms for the case when the size of the test is bounded by a small constant and more specially when that constant is two; (3) show that there does not exist an efficient approximation scheme for this problem even when the size of the largest set is bounded by two, unless P = NP.; We study the problem of rearranging gene trees to minimize the number of duplications and losses. We give an efficient algorithm for this problem, improving on a previous brute-force algorithm.

Keywords/Search Tags:

Problem, Sequence, Biological, Algorithm, Give

Related items

1	Biological Sequence Alignment Problem
2	Research On Key Technologies Of Parallel Optimization For Biological Sequence Analysis Algorithm Based On CPU+GPU Heterogeneous System
3	Research And Implementation Of Index Structure Of Biological Sequence
4	Research On Key Technologies Of Accelerator For Biological Sequence Analysis
5	Biological sequence analysis: Calibration of sequence alignment and hybridization metrics
6	Research On Algorithm For Similarity Search Of Biological Sequence Database
7	Optimization problems in data mining
8	Research On Biological Sequence Classification Based On Machine Learning Methods
9	Research And Application Of Index On Biological Sequences
10	Biological Heuristic Algorithms And Improvement Research