Font Size: a A A

Location And Analysis Of Repetitive Sequences In Genomes

Posted on:2006-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:X M WangFull Text:PDF
GTID:2178360185463364Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Repetitive sequences make up a significant fraction of genomes. According to the way they are arranged, repetitive sequences can be generally classified into two large groups: tandem repeat and dispersed repeat. Tandem repeat is a kind of repeat which consists of two or more continuous copies. Tandem repeats play a variety of roles in gene expression, regulation and evolution, and are ideal marker in genetic mapping. Further more, the techniqe of DNA fingerprinting based on the polymorphism of tandem repeat is now widely used in various fields such as medical jurisprudence, etc. The more eye-catching discovery is, in recent years, that some genetic diseases are related with certain trinucleotide repeats. Thus, the research on tandem repeats is of great theoretical and practical importance.However, the systematic analysis of tandem repeats in genomic range is rallied to extensive algorithmic supports, especially the algorithm for locating tandem repeats. Algorithm for locating tandem repeats is the base for repeat analysis. Therefore, the main research task of this thesis is about the algorithm for locating tandem repeats. A new algorithm for locating exact tandem repeats is presented. This new algorithm, which is based on a new dada structure——suffix array and LCP array, can locate all the exact tandem repeats in the genome, without any prior knowledge. After locating the exact tandem repeats, we use them as seeds and expand them by wraparound dynamic aligning, and finally get all the valid approximate tandem repeats with insertion, deletion and dismatching. This is the algorithm for locating approximate tandem repeats.A software RepLocate is implemented, which can locate all the tandem repeats effectively in genome sequences. To demonstrate its utility and efficiency, RepLocate has been applied to practical genome DNA sequences and the results as well as running time, have been given.
Keywords/Search Tags:Genome, Repetitive Sequences, Tandem Repeat, Approximate Tandem Repeat, Suffix Array, LCP Array, Algorithm for Locating Tandem Repeats
PDF Full Text Request
Related items