Font Size: a A A

Software And Algorithm For Analysis Of Genome Rearrangement

Posted on:2020-09-17Degree:MasterType:Thesis
Country:ChinaCandidate:R Z WangFull Text:PDF
GTID:2370330572971521Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Genome rearrangement,including basic operations such as translocation,transposition,reversal and so on,has resulted in a shift in the order of genes on the genome.So the problem of sorting genomes by rearrangements is one of the classic problems in bioinformatics,which is to explore the rearrangement process between different genomic sequences and calculate their rearrangement distance.The shortest rearrangement distance means that the two genomes are transformed into each other by minimal genetic rearrangements,which is important for inferring the evolution of species and obtaining the relationship between them.The research on the problem of sorting genomes by rearrangements has begun in the 1990s.In 1995,Sankoff et al.described the definition of sorting genomes by reversals,and proposed a greedy algorithm for solving the problem of sorting genomes by reversals.In 1997,Capara transformed the largest Euler circle decomposition problem into the problem of sorting genomes by reversals,which proved that the problem of sorting genomes by reversal on the undirected genome is NP-hard.In 2001,Bader designed a algorithm that can obtain the inverted genome sequences and calculate the rearrangement distance.Then in 2015,Shao et al.proposed a faster algorithm for measuring the breakpoint distance of template gene sequences.One year later,they applied the algorithm to the genomic template genome with repetitive genes.In 2018,Zhai et al.proposed an approximation algorithm with an approximate performance ratio of 4 to solve the problem of sorting genomes by rearrangement.However,most previous studies on genome rearrangement have remained on theory or simulated data,so it is not possible to analyze the potential relationships between chromosomes on the real genome.Moreover,previous studies of genome rearrangement rarely involved segmental duplications to explore the relationships of segmental duplications on chromosome.Segmental duplications serve as hotspots for recombination and mutation in genomes.With the advancement of sequencing technology,segmental duplications in many large genomes has been detected and published.This study extends genome rearrangement into segmental duplications of real biological genomes to research the process of rearrangement and analyze their species relationships.In this paper,we use the recognition results of segmental duplication from SDquest to design a method for calculating the rearrangement distance between two segmental duplication sequences.The goal of this paper is to eliminate all breakpoints between two sequences through the minimum number of rearrangement operations.The implementation process is as follows:(1)The identification results of the SDquest is used to number the segmental duplications,thereby the chromosome sequences are modeled into segmental duplications sequences.And the concept of adjacent blocks is imported,so the sequences are divided into adjacent blocks to reduce the adjacency destroyed during the rearrangement operations.(2)We design a deletion algorithm using greedy strategy and accurate deletion to remove excessive segmental duplication by comparing score in different positions;then,the reversal algorithm that divides into the same position matching,different position matching and special case(insertion)is implemented to eliminate all breakpoints between two segmental duplication sequences,and record the rearrangement process.(3)The latest genomes of humans and gorillas were grouped to experiment,and the number of rearrangement operations are counted and analyzed.The main innovations of this article:1.Designing and implementing a deletion algorithm involving greedy strategy to remove redundant segmental duplications between human and gorilla genomes.2.A reversal algorithm that achieves an approximate performance ratio of 4 eliminates all breakpoints of two segmental duplication sequences,and matches all adjacencies.3.We extend the genome rearrangement problem to segmental duplications of human and gorilla chromosomes,modeling the segmental duplications into two sequences.We calculate the number of rearrangement operations between the two segmental duplication sequences,and analyze the differences between chromosomes based on the recombination results,then analyze the species relationship.
Keywords/Search Tags:segmental duplication, sorting genome by rearrangement, adjacency, breakpoint
PDF Full Text Request
Related items