Font Size: a A A

An Algorithm Based On Suffix Tree For Identification Of Repeats In DNA Sequence

Posted on:2009-09-20Degree:MasterType:Thesis
Country:ChinaCandidate:X W WangFull Text:PDF
GTID:2178360272478057Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Repeat Identification is one of important measures in bioinformatics to analyze the genes sequences. Repetitive DNA sequence occupied a crucial position in eukaryotic genes. Through Repeat Identification, the evolution rules of genome and genetic laws of many diseases can be found. Many transposons which contain coding regions exist in genome sequences. Identification of these repeats is significant to decode genome.This paper proposes an algorithm named RepSeeker which can identify element repeats through taking the length and frequency of repetive sequences into account. This method adopted minimum limit frequency and extended repeats furthest through merge of overlapped repeats, simultaneously. Using a suffix tree as input which is constructed by DNA sequences and a search algorithm which is base on suffix trees as measure, this method outputs a classified table of element repeats finally. In order to enhance the efficiency of the algorithm of RepSeeker, the Ukkonen algorithm of construction of suffix free was improved. The leaves are numbered and leaf lists are stored in branch nodes in the process of construction of suffix free. In this foundation, the search algorithm based on this type of suffix-tree is adopted by the RepSeeker, which avoid subtree traversal in high frequence.The improvement has enlarged the space request, but has little influence on time complexity of Ukkonen algorithm. Representative several DNA sequences in NCBI was used as test object, and a contrast was made with before improved. The results show that the time of RepSeeker was reduced furthest with comparable accuracy.
Keywords/Search Tags:Bioinformatics, Repeat Identification, Suffix Tree, RepSeeker
PDF Full Text Request
Related items