An Algorithm Based On Suffix Tree For Identification Of Repeats In DNA Sequence

Posted on:2009-09-20

Degree:Master

Type:Thesis

Country:China

Candidate:X W Wang

Full Text:PDF

GTID:2178360272478057

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Repeat Identification is one of important measures in bioinformatics to analyze the genes sequences. Repetitive DNA sequence occupied a crucial position in eukaryotic genes. Through Repeat Identification, the evolution rules of genome and genetic laws of many diseases can be found. Many transposons which contain coding regions exist in genome sequences. Identification of these repeats is significant to decode genome.This paper proposes an algorithm named RepSeeker which can identify element repeats through taking the length and frequency of repetive sequences into account. This method adopted minimum limit frequency and extended repeats furthest through merge of overlapped repeats, simultaneously. Using a suffix tree as input which is constructed by DNA sequences and a search algorithm which is base on suffix trees as measure, this method outputs a classified table of element repeats finally. In order to enhance the efficiency of the algorithm of RepSeeker, the Ukkonen algorithm of construction of suffix free was improved. The leaves are numbered and leaf lists are stored in branch nodes in the process of construction of suffix free. In this foundation, the search algorithm based on this type of suffix-tree is adopted by the RepSeeker, which avoid subtree traversal in high frequence.The improvement has enlarged the space request, but has little influence on time complexity of Ukkonen algorithm. Representative several DNA sequences in NCBI was used as test object, and a contrast was made with before improved. The results show that the time of RepSeeker was reduced furthest with comparable accuracy.

Keywords/Search Tags:

Bioinformatics, Repeat Identification, Suffix Tree, RepSeeker

PDF Full Text Request

Related items

1	Finding MUMs With Enhanced Suffix Arrays
2	Research And Application Of Bioinformatics Data Fusion And Search Algorithms For Translational Medicine
3	Research On Construction Of Index Structure For Biological Sequences
4	Study On Algorithms For Identification Of Repeats In Large-scale Genome
5	Multi-pattern Matching With Wildcards Based On Suffix Tree And Suffix Array
6	Research Of High-efficiency Motif Finding Algorithms
7	Research Of Finding Maximal Unique Matches In Genome
8	Research On Vietnamese News Topic Recognition Method Based On Suffix Tree Clustering Algorithm
9	Research On Temporal XML Index Based On Suffix Tree
10	Research Of A Suffix Tree Based Automatic Wrapper Generation Method