Font Size: a A A

Studies On Algorithms For Genome Rearrangement Problems

Posted on:2011-08-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:X YinFull Text:PDF
GTID:1100360305950550Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
It is a great challenge in the coexistence and struggle of human being and the nature to explain people's confusion about life using genetic information. Exploring the principles of life through genetic information can not work without "calculation". Computational biology has become a very active research area during the past twenty years.Genome rearrangement becomes an important study area of computational biology. In the late 1930s, Dobzhansky and Sturtevant published a milestone paper with an evolutionary tree presenting a rearrangement scenario with 17 reversals for the species Drosophila pseudoobscura and Miranda, which gave convincing proof for the existence of genome rearrangements. Many subsequent studies show that genome rearrangement is a common mode of molecular evolution and one main reason for the diversities of microbes, plants and animals.A chromosome is a sequence of genes and does not have an orientation. A genome is a set of chromosomes. Each gene is represented by an integer. If the direction of a gene is known, a sign of "+" or "-" will be added before the gene to indicate the direction. Some species' genomes contain single chromosome, while some species' genomes are partitioned into a number of chromsomes. A genome (or chromosome) is signed if the genes it contains are signed, and otherwise is unsigned.Although genome rearrangement process is very complicated, there are three basic rearrangement operations:reversal, translocation and transposition. A rearrangement operation transforms one genome into another. The evolution process between two species can be approximated by the genome rearrangements transforming one genome into the other.Given two genomes A and B, the problem of genome rearrangement asks to find the minimum number of rearrangements as well as the shortest sequence of rearrangements transforming A into B. The minimum number of rearrangements is called the rearrangement distance between A and B. Based on the tests data in molecular biology, rearrangement distance is one significant indicator to figure the relationship between species. The genome rearrangement problems have been extensively studied in recent years. The basic genome rearrangement problem only consider single operation, such as reversal sorting, translocation sorting and transposition sorting. Reversal is the operation which reverses the order of a sequence in a chromosome. There are several polynomial algorithms for sorting signed genomes by reversals. The best known algorithm can compute the shortest reversal sequence in O(n(?)). For unsigned case, it has been proved to be Np-hard. The best known approximation ratio is 1.375. Translocation is the operation which exchanges ends between two chromsomes. Sorting signed genomes by translocations has been proved to be solvble in polynomial-time. Subsequent works decreased the time complexity. The best known algorithm can compute the shortest translocation sequence in O(n(?)). For unsigned case, the problem was proved to be Np-hard. The best known approximation ratio is 1.5+ε. Transposition is the operation which swaps the positions of two adjacent segments in a chromosome. The complexity of sorting by transpositions is still unknown. It has several 1.5-approximation algorithms. The best known approximation ratio is 1.375. There are also problems considering two or more operations. The problem which considers both reversal and translocation is regarded as one of the most natural problems of genome rearrangement. Hannenhalli and Pevzner presented the first polynomial algorithm for the signed case.Break-point graph is a basic tool for designing algorithms for genome rearrangement problems. It has been proved that the rearrangement distance is closely related to some special combinatorial stuctures in the break-point graph.In this thesis, two problems for genome rearrangement are considered.(1) Sorting signed genomes by generalized translocations.Translocations include reciprocal translocations and non-reciprocal translocations. The problem of sorting by translocations asks to find a shortest sequence of translocations transforming one multi-chromosomal genome into another. There are several polynomial algorithms for the signed case.It is to be noted that all the algorithms only consider reciprocal translocations. If a genome can be transformed into another only by reciprocal translocations, the two genomes must have the same set of chromosome ends. Thus the above algorithms can only be applied to a pair of genomes having the same set of chromosome ends, which rarely happens in biological practice. Such a restriction can be removed if non-reciprocal translocations are also allowed. So we have to consider a more general case when two genomes have different chromosome ends. Clearly, in such a case, non-reciprocal translocation are needed. Ozery-Flato firstly mentioned the problem of sorting by generalized translocations which considers both reciprocal and non-reciprocal translocations. She conjectured that this problem can be solved in polynomial-time. However, there is little progress in this problem.In chapter 3 and chapter 4, we show how to extend the polynomial algorithm for sorting by reciprocal translocations for sorting by generalized translocations, which allows us to compare genomes containing different chromosome ends. The thesis proposes the idea to "capping" the source genome A and target genome B by additional 2n markers to construct a pair of co-tailed genomes A and B. Then the optimal generalized translocation sequence between A and B can be mimiced by the optimal reciprocal translocation sequence between A and B, which leads to an exponential algorithm. In order to improve the time complexity, the thesis further proposes the approach to construct partial graph of A and B from the break-point graph of A and B. This approach reduce the problem to a problem of adding several gray edges to the partial graph such that the reciprocal translocation distance of the resulting break-point graph is minimum.In chapter 3, a OPT+2-approximation for sorting by generalized translocations is proposed, where OPT is the optimal solution of this problem. The number of minimal subpermutations is an important parameter in the formula to compute the reciprocal translocation distance. We present a method for adding gray edges to the partial graph, which results in a break-point graph in which the difference of the number of cycles and the number of minimal subpermutations is maximum.In chapter 4, we present the generliazed translocation distance formula as well as a polynomial algorithm for sorting by generlized translocations. Discussing the parameter f in the reciprocal translocation distance formula is the key to compute the exact value of generliazed translocation distance. It is also a difficult point to deal with the parameter f, because the value of f is related to the even and odd of the number of minimal subpermutations and the existence of even-isolation. To prove a exact lower bound for generlized translocation distance, we give the definition of four combinatorial structures in the partial graph which may result in even-isolation. Finally, we present an optimal solution for adding gray edges such that the reciprocal translocation distance of the resulting break-point graph is equal to the exact lower bound.(2) Sorting signed genomes by reversals and translocations.This problem is currently solved by Hannenhalli by a recuction to the problem of sorting signed genomes by reversals, where each reversal simulates either a reciprocal translocation or an internal reversal. However, Hannenhalli's algorithm can hardly distinguish between reversal and translocation operations. In a recent paper, Ozery-Flato and Shamir wrote that:"We believe that an algorithm for sorting by reversals and translocations that explicitly treats translocations and reversals as distinct operations would be more natural and powerful than one that does not."In chapter 5, a new algorithm that treats translocations and reversal as distinct operations is proposed. The number of knots in break-point graph was shown to be closely related to the reversal and translocation distance. When there is no knot in the break-point graph, the problem can be solved in polynomail-time via the theories in reversal sorting and translocation sorting. As a result, it is the key point for designing the algorithm to find a valid reversal and translocation sequence to eliminate all the knots.The main contributions of the thesis are as follows:1. For the first time, we consider the more general case for sorting by translocations-when the source genome and the target genome contain different set of chromosome ends. We present an OPT+2-approximation for sorting by generalized translocations, where OPT is the optimal solution of this problem. This algorithm can produce an asymptotically optimal sequence transforming the source genome to the target genome.2. We present an exact formula for computing the generalized translocation distance as well as a polynomial-time algorithm for sorting by generlized translocations. The algorithm confirms Ozery-flato's conjecture that sorting by generalized translocation can be solved in polynomial-time.3. We present a new polynomial algorithm for sorting signed genomes by reverals and translocations, which treats reversals and translocations as distinct operations.
Keywords/Search Tags:Genome rearrangement, Reversal, Translocation, Sorting, Algorithm
PDF Full Text Request
Related items