Font Size: a A A

Algorithms for biological sequence problems

Posted on:2002-09-22Degree:Ph.DType:Dissertation
University:Carnegie Mellon UniversityCandidate:Halldorsson, Bjarni VilhjalmurFull Text:PDF
GTID:1468390014450790Subject:Mathematics
Abstract/Summary:
We explore computational problems arising in biological sequence analysis. In particular we explore problems related to DNA sequencing, protein identification and phylogenetic analysis. We show how biological problems can be formulated into a framework amenable to computation and we give efficient algorithms for these problems.; We study a proposed method for sequencing DNA, interactive sequencing by hybridization (ISBH). In this approach a series of arrays (SBH chips) are built and on each SBH chip a large number of oligonucleotides are arranged. A combinatorial method is used to construct the sequence from the collection of probes that occur in it. We give an optimal algorithm to decide which oligonucleotides to lay on the chips.; We study computationally the feasibility of identifying proteins using antibodies that are specific to short sequences of amino acids. A set of antibodies are built and laid on a chip. The proteins are then identified from their binding to the antibodies, the binding of the proteins to the antibodies being dependent on whether the amino acid substring occurs in the protein. For this problem we give two separate results. First we show, assuming that no errors occur in the antibody binding process, that the number of antibodies that need to be built is manageable. Secondly, we give a method for identifying proteins in the presence of a large number of false negative errors and a modest number of false positive errors.; Assuming that no errors occur in the antibody binding, we model the protein identification problem as a minimum test collection problem. For this problem, we: (1) show that it cannot be approximated efficiently within o(log n) factor where n is the number of proteins unless P = NP; (2) give new algorithms for the case when the size of the test is bounded by a small constant and more specially when that constant is two; (3) show that there does not exist an efficient approximation scheme for this problem even when the size of the largest set is bounded by two, unless P = NP.; We study the problem of rearranging gene trees to minimize the number of duplications and losses. We give an efficient algorithm for this problem, improving on a previous brute-force algorithm.
Keywords/Search Tags:Problem, Sequence, Biological, Algorithm, Give
Related items