An Algorithm Of Finding A Breakpoint Exemplar String Of Genomes

Posted on:2012-08-30

Degree:Master

Type:Thesis

Country:China

Candidate:X G Lv

Full Text:PDF

GTID:2218330338963793

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

As the human genome project deeply developed, scientific workers found that all the human inheritance, traits and even disease etc. are all connected closely with genes. Gene carrier, namely, chromosome is a complete gene sequences. Different chromosome represents an individual in different aspects of characters. Previous studies have indicated that the similarity degree, corresponding to the same traits, of gene sequences of chromosomes of the two individuals, is related to that of two individuals of certain characters.This paper discusses only the case that one genome includes only one chromosome and allow the same genome contains duplicated genes(but the same gene most appear twice). In 1999, D.Sankoff put forward a solution, for the genome rearrangement problem which allows a chromosome to have duplicated genes, in which a subsequence of the source (target genome respectively) is generated, which contains all genes and no duplicated ones. Because there may be many situations to generate a subsequence, our goal is to find such a pair of subsequences:it contains two subsequences which require the fewest steps to be transformed to each other, among all pairs of subsequences. Therefore, we establish the mathematical model that compares the similarity of gene sequences of the two chromosomes, which assumes that two chromosomes, which have not been compared, contain the same gene types. Every chromosome is regarded as a sequence, and each gene as a letter. A chromosome's gene can repeat, and the question is that whether there is such a common subsequence between the two chromosomes, each gene of which appears once and only once. Whether there exists such common subsequence in some extent quantifies chromosomes similarity.In this paper, we consider each chromosome is allowed to have the same gene appear at most twice. We initiate an algorithm for solving this problem, analyze in detail its time complexity, present its proof of correctness, use Java to realize it, and use randomly generated experimental data to show the algorithm's efficency.The problem for finding two chromosomes' longest common subsequence has been proved to be NP-hard [3] and has inapproximability [2], so we can only design exponential algorithm to solve it, and thinking of a method to reduce the number of enumeration is the core of the algorithm we presented. Our algorithm is divided into two parts, and the first part is related to the enumeration, with an O(1.86n) time complexity, while the second part is can be completed in polynomial time, namely O(n3). If two chromosomes have a common subsequence, then it will be returned, and "no" will be returned if they don't. The time complexity of the whole algorithm is O(n31.86n), which is better than an O(n2n) time complexity of the fastest algorithm except itself(n denotes gene species number).

Keywords/Search Tags:

chromosome, gene, common subsequence, NP-hard, inapproximability

PDF Full Text Request

Related items

1	The Research On Algorithms For The Longest Common Subsequence Problem And Variants
2	Algorithms For Sorting By Short Swap And Longest Common Exemplar Subsequence
3	The Research On The Longest Common Subsequence Query Algorithm
4	Parallel Algorithm For Multiple Longest Common Subsequence And Application Research On Hadoop Platform
5	Explorations on the longest common increasing subsequence problem
6	Study On Parallel Algorithms For Longest Common Subsequence On Heterogeneous Cluster Computing Systems
7	Approximate Longest Common Subsequence Query Processing And Optimization On Biological Sequence
8	Research Of Software System For Chromosome Image Analysis
9	Genetic Algorithm In Intelligent Intrusion Detection System
10	A New GEP Algorithm Based On Multi-phenotype Chromosome And Its Application