Study On Algorithms For Identification Of Repeats In Large-scale Genome

Posted on:2008-01-11

Degree:Master

Type:Thesis

Country:China

Candidate:F Bai

Full Text:PDF

GTID:2178360212974585

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Repeat Identification is the most common fundamental subject of genome analysis in modern bioinformatics. Through repeat identification, the roles in genome evolution and inheritance of disease can be found. Many transposons and retrotransposons which contain coding regions exist in genome sequences. Identification of these repeats is important to decode genome. Although a lot of algorithm was proposed to solve this problem, but there is not an optimal algorithm of repeat identification. For current flaws we present a novel kind of algorithm for repeat identification based on seed sequences.Two methods, RepeatSearcher and GSRSearcher were proposed in this paper, which based on extension of seed sequences. Using sequences which include seed, RepeatSearcher translate local pair-wise alignment into multiple sequences alignment, combining gapped penalty in limited area. Algorithm extends consensus sequence according score of alignment, and at the same time extends every repeat sequence. In this way, the accuracy boundaries of repeat sequences can be conformed when extending consensus sequence. Multiple alignment greatly avoid the imprecise of high score pairs. GSRSearcher inherit the way of seed's extension and make use of statistical function of Gibbs Sampling. Considering infection of background in genome, the repeat family sequences which were identified will be more accurate. Using probability statistical policy, the speed of convergence in GSRSearcher is more reasonable then the speed of convergence in RepeatSearcher and can judges the boundaries of repeat sequence exactly.In the end, the report tested twelve kinds of genome sequences of mammal on RepeatSearcher and GSRSearcher, and then compared the output with RepBase and the output of RECON. The result shows that our algorithm is better than RECON and is an effect algorithm.

Keywords/Search Tags:

bioinformatics, repeat identification, consensus sequence, accurate boundaries, seed sequence

PDF Full Text Request

Related items

1	The Study Of Seeded Sequence Alignment Method
2	An Algorithm Based On Suffix Tree For Identification Of Repeats In DNA Sequence
3	A Simulated Annealing Approach To Multiple Sequence Alignment
4	Conformational studies of a consensus sequence peptide (CSP) and a real sequence peptide (RSP) of apolipoproteins by circular dichroism spectroscopy and x-ray crystallography
5	Research On Software Fault Location Based On Biological Consensus Sequence
6	Research On Sequence Alignment Algorithms In Bioinformatics
7	Sequence-specific sequence comparison using pairwise statistical significance
8	A Maximum Weighted Path Approach To Multiple Alignment For DNA Sequences
9	Research On Multiple Sequence Alignment Algorithms In Bioinformatics
10	The System Of Display And Analysis Of Gene Sequence