Font Size: a A A

Algorithms For Genomic Structure Ananlysis

Posted on:2019-02-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:L R PuFull Text:PDF
GTID:1318330545453578Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The problem of sorting genomes by rearrangements,which asks to compute rearrangement distance between two genomes and find a shortest sequence of rearrangements that transform one genome into another,is one of the typical problems in computational genomics.The rearrangement distance between two genomes can be used to approximate the evolutionary distance between these two genomes.Although rearrangements in bio-genomes are very complicated,they can be formalized as some basic operations such as reversal,translocation and transposition.Reversal and transposition are rearragment operations working on a single chromosome,while translocation is a rearrangement operation working on two different chromosomes.Thus,compared with reversal and transposition,translocation plays a more important role in the genome rearrangement process.Besides,there have been known translocation operations happened in wheat genomes when under high temperature stress.This study demonstrate that studies of translocation distance between genomes will be helpful for us to better understand the genome evolutionary history.Currently,by using breakpoint graph,the problem of sorting signed genomes by translocations has been solved polynomially.However,the problem of sorting unsigned genomes by translocations has been proved to be NP-hard.As a result,researchers are devoted to design polynomial approximation algorithms for these,problems.To design approximation algorithm for the problem of sorting unsigned genomes by translocaitons,we usually try to search for a good cycle decomposition of the corresponding breakpoint graph.Thus,a good algortihm for breakpoint graph cycle decomposition problem can be used in designing better algorithms for approximating those genome rearrangement problems.While breakpoint graph represents the workhorse of genome rearrangement studies,it remains unclear how to apply breakpoint graph for analyzing the evolutionary history of segmental duplications.Segmental duplications are hotbeds for genomic rearrangements folowed by gene innovation and rapid adaption.Variations in segmental duplications have been linked to various genetic diseases including Hemophilia A,the Smith-Magenis syndrome,the Angelman syndrome,and many others.With the development of sequencing technology,the assembly of human genome becomes very accurate now.What follows is the detection and analysis of segmental duplications in these assembled genomes.An accurate detection of segmental duplications in human genome and detail analysis of it will help researchers better understand the human evolutionary history and explore the clinical treatment of the genetic diseases.This paper focus on research of genome structure such as genome rearrangement and segmental duplication events.The main contributions of this thesis are as follows:1.A linear time algorithm to detect whether a breakpoint graph can be decomposed into none other than 2-cycles.The problem of breakpoint graph cycle decomposition asks to find a largest collection of edge-disjoint cycles in a breakpoint graph.This problem is NP-hard,and can be approximated to 1.4193+?.To design approximation algorithm for breakpoint graph cycle decomposition problem,we usually try to decompose the breakpoint graph into as many 2-cycles as possible.In this paper,we present a linear time algorithm to detect whether a breakpoint graph can be decomposed into none other than 2-cycles.If yes,the algorithm will return a 2-cycle decomposition of the breakpoint graph.2.Design an approximation algorithm for unsigned translocation sorting problem.The problem of sorting unsigned genomes by translocations has been proved to be NP-hard,and its best approximation ratio is 1.408+ e.In this paper,we design an approximation algorithm for unsigned translocation sorting problem and improve the approximation ratio of this problem from 1.408+ ?to 1.375.This approximation algorithm uses the idea of divide and conquer.To say strategically,we first decompose the breakpoint graph with respect to two unsigned genomes into a set of disconnected components,such that each component can be decomposed into alternating cycles alone,then decompose each component into as many cycles as needed to approximate the unsigned translocation sorting to 1.375.The main technique we used in this paper is to avoid decomposing small candidate subpermutaitons,which contains only one 2-cycle or 3-cycle,into independent components.3.Detecion and analysis of ancient segmental duplications in mammalian gneomes.In this paper,we present an SDquest algorithm,which can fast detect segmental duplications in mammalian genomes.We run SDquest algorithm on human,gorilla and mouse genomes,and further compared segmental duplications identified by SDquest with the known segmental duplications.Result shows that SDquest detect a lot of novel segmental duplications in human,gorilla and mouse genomes.We further construct breakpoint graph for segmental duplications identified by SDquest and apply breakpoint graph for analyzing the evolutionary history of segmental duplications.SDquest algorithm has been implemented as a software by using Java,and SDquest software is available on Github website.SDquest software is the first open and easy-to-use tool for detection of segmental duplications in mammalian genomes.
Keywords/Search Tags:Breakpoint graph, Genome rearrangement, Approximation algorithm, Segmental duplications
PDF Full Text Request
Related items