Font Size: a A A

Scaffold Filling Under The Common Adjacency Distance

Posted on:2014-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhaoFull Text:PDF
GTID:2248330398959204Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of genome sequencing technology, it is possible to sequence more organisms for genomic analysis; however, many released genomes are incomplete genomes which are called scaffolds. Scaffolds cannot be used for genomic analysis because they could introduce errors. A feasible method is filling the missing genes into scaffolds, which results a problem called scaffold filling that gradually becomes a hot research field of computational biology. Scaffold filling uses a genome from a close species as a reference, and fills the missing genes into scaffolds to make scaffolds complete and close to the original genome. Scaffold filling is one-sided or two-sided according to the reference genome is complete or not.In this paper, a character represents a gene family and a string represents a genome. When the input genome contains no gene repetition, scaffold filling problem is polynomially solvable under the breakpoint distance or DCJ distance. When the input genome contains gene repetition, scaffold filling problem is NP-Complete under the breakpoint distance or common adjacency distance.At present, the major study has focused on the general case of scaffold filling problem with gene repetition, that is filling a string of arbitrary length at the same position is allowed, however, this is not common in many cases that only1or2genes are filled at the same position. Therefore, in this paper, we consider a special scaffold filling problem that if the string of length l can be filled at the same position of a scaffold to make the adjacencies increased by1+1under the common adjacency distance, then the value of l is only1or2. In the special scaffold filling problem, maximize the distance between two genomes is our objective. The above problem is abbreviated as SF-MNCA(2)(Scaffold Filling to Maximize the Number of Common Adjacencies). The main contribution of this paper is as follows.(1) Firstly prove that SF-MNCA(2) is NP-Complete. This paper uses the proof method of NP-Complete theory to prove that SF-MNCA(2) is NP-Complete by making a reduction from the NP-Complete problem3DM to one-sided SF-MNCA(2) in polynomial time.(2)Design a2-approximation algorithm for one-sided SF-MNCA. This algorithm has deep theoretical significance because it lays a theoretical foundation for the subsequent algorithms. This paper gives the detail process and correctness proof of the2-approximation algorithm.(3) Design a6/5+ε-approximation algorithm for one-sided SF-MNCA(2). This is a local search algorithm base on the2-approximation algorithm of one-sided SF-MNCA. This paper designs the algorithm, analyzes its running time and proves its factor is6/5+ε.
Keywords/Search Tags:Scaffold Filling, Common Adjacencies, Approximation Algorithm, Local Search
PDF Full Text Request
Related items