Font Size: a A A

An Algorithm Of Finding A Breakpoint Exemplar String Of Genomes

Posted on:2012-08-30Degree:MasterType:Thesis
Country:ChinaCandidate:X G LvFull Text:PDF
GTID:2218330338963793Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As the human genome project deeply developed, scientific workers found that all the human inheritance, traits and even disease etc. are all connected closely with genes. Gene carrier, namely, chromosome is a complete gene sequences. Different chromosome represents an individual in different aspects of characters. Previous studies have indicated that the similarity degree, corresponding to the same traits, of gene sequences of chromosomes of the two individuals, is related to that of two individuals of certain characters.This paper discusses only the case that one genome includes only one chromosome and allow the same genome contains duplicated genes(but the same gene most appear twice). In 1999, D.Sankoff put forward a solution, for the genome rearrangement problem which allows a chromosome to have duplicated genes, in which a subsequence of the source (target genome respectively) is generated, which contains all genes and no duplicated ones. Because there may be many situations to generate a subsequence, our goal is to find such a pair of subsequences:it contains two subsequences which require the fewest steps to be transformed to each other, among all pairs of subsequences. Therefore, we establish the mathematical model that compares the similarity of gene sequences of the two chromosomes, which assumes that two chromosomes, which have not been compared, contain the same gene types. Every chromosome is regarded as a sequence, and each gene as a letter. A chromosome's gene can repeat, and the question is that whether there is such a common subsequence between the two chromosomes, each gene of which appears once and only once. Whether there exists such common subsequence in some extent quantifies chromosomes similarity.In this paper, we consider each chromosome is allowed to have the same gene appear at most twice. We initiate an algorithm for solving this problem, analyze in detail its time complexity, present its proof of correctness, use Java to realize it, and use randomly generated experimental data to show the algorithm's efficency.The problem for finding two chromosomes' longest common subsequence has been proved to be NP-hard [3] and has inapproximability [2], so we can only design exponential algorithm to solve it, and thinking of a method to reduce the number of enumeration is the core of the algorithm we presented. Our algorithm is divided into two parts, and the first part is related to the enumeration, with an O(1.86n) time complexity, while the second part is can be completed in polynomial time, namely O(n3). If two chromosomes have a common subsequence, then it will be returned, and "no" will be returned if they don't. The time complexity of the whole algorithm is O(n31.86n), which is better than an O(n2n) time complexity of the fastest algorithm except itself(n denotes gene species number).
Keywords/Search Tags:chromosome, gene, common subsequence, NP-hard, inapproximability
PDF Full Text Request
Related items