Font Size: a A A

Algorithms For Sorting Signed Genomes By Multiple-Operation Rearrangements

Posted on:2012-12-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:F C HaoFull Text:PDF
GTID:1488303353951789Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Comparative Genomics is an important research branch in Bioinformatics. In this field, computing the evolve distance between two species, namely the genome rearrangement problem, is essential. The applications includes reconstructing the evolutionary distance tree, exploring gene functions, and analyzing the principle of pathogenic disease. A genome is a set of chromosomes, while a chromosome is a sequence of genes that does not have an orientation. In this thesis each gene is represented by an integer. If the direction of a gene is known, the sign ("+" or "-") is added to the gene to indicate its direction.Sorting genomes reveals the phenomenon of sequence order changing of genomes, which is abstracted to be some fundamental operations such as Reversal, Translocation, Transposition in the genomes rearrangement problem.The genomes rearrangement problem is essential in Comparative Genomics. Given two genomes A and B, the target of the genomes rearrangement problem is to find a sequence of operations transforming A to B, whose length should be the minimum among all the possible sequences. The number of the minimum sequence is called sorting distance between A and B. People have proposed various of algorithms for different genomes rearrangement problems.After Bafna and Pevzner gave a 1.5-approximation algorithm for sorting genomes by reversals, Hannenhalli and Pevzner designed an O(n5) time accurate algorithm, followed by an O(n2) one given by Kaplan, Shamir and Tarjan. For the problem sorting genomes by translocations, firstly Hannenhalli gave an O(n) time algorithm, then Zhu Daming and Ma Shaohan improved the time complexity to be O(n2logn) followed by Wang Lusheng, Zhu Daming, Ma Shaohan and Liu Xiaowen’s O(n2) time one, then Li Guojun, Qi Xingqin, Wang Xiaoli and Zhu Binhai designed an O(n) time algorithm only computes the translocation distance. For the problem sorting genomes by transpositions, before Bulteau, Fertin and Rusu proved that is NP-hard, Bafna and Pevzner gave a 1.5-approximation algorithm with the time complexity of O(n2), Hartman and Shamir gave a 1.5-approximation concise algorithm, Elias and Hartman gave a 1.375-approximation one.It was named sorting genomes by multiple-operation rearrangements when the rearrangement involves more than one operations. It seems more valuable because this problem is more general than sorting genomes by single-operation rearrangements. Walter, Dias and Meidanis studied the problem sorting genomes by reversals and transpositions, then they gave a 2-approximation algorithm. Gu, Peng and Sudborough gave a 2-approximation algorithm for sorting genomes by reversals, transpositions and transreversals. Hartman and Sharan gave a 1.5-approximation algorithm for sorting genomes by reversals and transreversals. Hannenhalli and Pevzner studied the problem sorting genomes by reversals and translocations, giving the first accurate algorithm, followed by Yin Xiao and Zhu Daming’s new one. Yin Xiao and Zhu Daming also developed the operations of translocation, fission and fusion to be a generalized translocation and proposed the problem sorting genomes by generalized translocations, giving an O(n(?)) time accurate algorithm.It was discovered that insertion or deletion of some gene sequence interval happens accompany the rearrangement process, while many phylogenetic instances take these two operations as necessary. Qi Xingqing, Li Guojun, Li Shuguang et al. first gave an OPT+2 approximate polynomial time algorithm for sorting genomes by translocations and deletions before they proposed sorting genomes by translocations, insertions and deletions with a brief heuristic method. Here OPT is the minimum number of rearrangement operations in the process sorting the source genome to the target one.We studied the algorithms for sorting genomes by translocations, insertions and deletions in this paper, the input of which is two signed genomes A, B. Sorting genomes by translocations and deletions requires finding a transforming sequence with the minimum number of translocation and deletion operations. While sorting genomes by translocations, insertions and deletions requires finding the sequence with least translocation, insertion and deletion operations.In this thesis, we studied the algorithms for sorting genomes by translocations and deletions first. The cycle-graph is used to present the difference in sequence order between the two input genomes. There is a special sub-graph named Minimum Sub Permutation(MSP), the quantity, location, and some other properties of which is the key factors affect the translocation and deletion distance. We use forest display MSPs and their adjacency relation. According to the different values of the forest arguments, we classified the instantiation space of input into 34 cases, then gave the accurate translocation and deletion distance for each case. The distance lower bound formula given by Qi Xingqing, Li Guojun, Li Shuguang et al. contains 6 parameters, they are number of genes, number of chromosomes, number of cycles, number of indirect cycles, number of indirect MSPs and an auxiliary parameter judging the existance of even-isolation and the parity of MSPs’quantity. We studied the structure of the forest again and brought in 2 new parameters, the indirect MSPs that can not be elimited and indirect ordinary cycles, then gave a unify translocation and deletion distance formula suitable for all the cases. The accurate algorithm for sorting genomes by translocations and deletions was given naturally in the process analysing the distance formula. (Chapter 3)For the problem of sorting genomes by translocations, insertions and deletions, first we can construct an auxiliary genome through calling the alogorithm of sorting by translocations and deletions, then transform the source genome to be the auxiliary genome with insertions and then transform the auxiliary genome to be the target genome with calling the alogorithm of sorting by translocations and deletions again. Then, we gave a translocation, insertion and deletion distance formula according to the translocation and deletion distance formula. (Chapter 4)Finally we discussed the problem of sorting general signed genomes by multiple operations that allows the genomes containing duplicated chromosomes and genes while the rearrangement operations could be reversals, translocations, transposition, block-interchange, fission and fusion. A heuristic method is given following with the preliminary design, detailed design and some experiment result. (Chapter 5)The main contributions of the thesis are as follows:1). Gave the translocation and deletion distance formula for the first time with an accurate polynomial time algorithm.2). Improve the time complexity of algorithms for sorting genomes by translocations and deletions to be O(n2) from O(n3). 3). Gave the translocation, insertion and deletion distance formula with an accurate polynomial time algorithm for the first time, which caculates the distance and rearrangement sequence.
Keywords/Search Tags:Genome rearrangement, Sorting, Algorithm, Multi-operations
PDF Full Text Request
Related items