Font Size: a A A

Using genetic algorithm to solve median problem and phylogenetic inference

Posted on:2015-03-23Degree:Ph.DType:Dissertation
University:University of South CarolinaCandidate:Gao, NanFull Text:PDF
GTID:1478390017998264Subject:Computer Science
Abstract/Summary:
Genome rearrangement analysis has attracted a lot of attentions in phylogenetic computation and comparative genomics. Solving the median problems based on various distance denitions has been a focus as it provides the building blocks for maximum parsimony analysis of phylogeny and ancestral genomes. The Median Problem (MP) has been proved to be NP-hard and although there are several exact or heuristic algorithms available, these methods all have great diculty to compute when the three genomes are distant and contain high evolution events. However, current approaches, such as MGR and GRAPPA, are restricted on small collections of genomes and low-resolution gene order data of a few hundred rearrangement events. In my work, we focus on heuristic algorithms which combine genetic algorithm (GA) with genomic sorting algorithm to produce new methods and directions for whole genome median solver, ancestor inference and phylogeny reconstruction.;For median problem, we propose two genetic algorithms based on different distance measurements: GA-Inversion and GA-DCJ. Based on genetic algorithm frame, we develop our algorithms of every procedure and embed it into traditional genetic algorithm. The final results of our GA-based algorithm are optimal median genome(s) and its median score. Our algorithm use limited time and space, especially in large scale and distant datasets and the important aspect is its better results compared with GRAPPA and AsMedian. What is more, using genetic algorithm can provide more than one genome which has best median score, so it provides larger chance to get accurate genome structure compared with its ancestor. Compared with GRAPPA and AsMedian, our GA-based algorithm provides excellent accuracy and robustness in ancestor genome through our simulation experiments.;Extending the ideas of equal genome median solver, we develop an algorithm to solve DCJ-indel median problem, GaDCJ-indel, which can solve unequal genomes median problem (without duplication). In DCJ-Indel model, one of the key steps is still sorting operation. The difference here is there are two sorting directions: minimal DCJ operation path or minimal indel operation path. Following different sorting path, in each step scenario, we can get various genome structures to fulfill our population pool. We adopt adaptive surcharge-triangle inequality in our fitness function in order to get effective and efficient results. Rearrangements of genomes through operations such as reversal and transposition are rare events that enable researchers to reconstruct the evolutionary history among species.;An important application of genome rearrangement analysis is to infer ancestral genomes, which is valuable for identifying patterns of evolution and for modeling the evolutionary processes. However, computing ancestral genomes is very difficult and we have to rely on heuristic methods that have various limitations. We propose a GA-Tree algorithm which adapts meta-population, co-evolution and repopulation pool methods In this paper, we describe the first genetic algorithm for ancestor inference, which uses fitness scores designed to consider co-evolution and uses sorting-based methods to initialize and evolve populations. Our extensive experiments show that compared with other existing tools, our method is accurate and can infer ancestors that are much closer to true ancestors.
Keywords/Search Tags:Median, Genetic, Genome, Solve, Ancestor
Related items