Study On Techniques Of Searching For Approximate Repeats In DNA Sequences Based On Hamming Distance

Posted on:2009-11-03

Degree:Master

Type:Thesis

Country:China

Candidate:Y Zhao

Full Text:PDF

GTID:2120360308979756

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

With the start of Human Genome Project and the rapid increase of biological data, bioinformatics is gradually becoming one of the most important research fields, which studies the biological systems by applying mathmatics, computer science and information science. In the broad research areas of bioinformatics, repeats searching problem is an important and basic DNA sequence analysis problem, of which approximate repeats searching is an important issue which many researchers have paid great attention to, since there is great biological significance in approximate repeats and the searching problem itself is a new and complicated one.This thesis focuses on the searching problem of two kinds of important approximate repeats, which are approximate tandem repeats and approximate inverted repeats. Based on the proposed definitions of the two kinds of repeats, two indexing structures and relative searching algorithms are designed respectively.For the problem of searching for approximate tandem repeats, firstly pattern-similarity and neighbor-similarity are proposed based on hamming distance for similarity measurement, then a new definition Largest Neighbor-similarity-based Approximate Tandem Repeats (LNATR) is presented. After that a new indexing structure named Pattern Unit Array (PUA) is designed, based on which an effective LNATR searching algorithm is proposed, and is compared with another approximate tandem repeats searching algorithm designed by Gad M. Landau.For the problem of searching for approximate inverted repeats, the thesis first presents matching-degree based on hamming distance to measure the similarity between the two patterns of inverted repeats, based on which a new definition Largest Matching-degree-based Approximate Inverted Repeats (LMAIR) is presented. Then Boundary Index (BI) is designed for further LMAIR searching. Finally, simple LMAIR searching algorithm and optimized LMAIR searching algorithm are proposed based on BI, and comparation is made between the two LMAIR searching algorithms.

Keywords/Search Tags:

DNA sequence, approximate tandem repeats, approximate inverted repeats, Pattern Unit Array, Boundary Index

PDF Full Text Request

Related items

1	Study On Techniques Of Searching For Approximate Repeats In Dna Sequences Based On Hamming Distance
2	The Study On The Isothermal Amplification Characteristics Of Tetranucleotide Repeats And The Features Of Inverted Repeats In Genomes
3	Nonrandom Clusters Of Close Inverted Repeats In Herpesviruses And Poxviruses
4	A New Pipeline For Targeted Profiling Of Short Tandem Repeats In Massively Parallel Sequencing Data
5	Bioinformatic Analysis Of Tandem Repeats And Non-Coding RNA In Deinococcus Radiodurans
6	Primary Studies Of Quasi-period Patterns In Alu Repeats
7	The Theoretical And Experimental Study Of Nucleosome Positioning On The Sequences Containing GAA Triplet Repeats And R5Y5 Motif
8	Nonlinear Perturbation Equations Approximate Noether Symmetry And Approximate Conservation Law
9	Studies On Genotyping Of STR And SNP With Near Infrared Spectroscopy And Chemical Pattern Recognition
10	Probe Design For Peptide Nucleic Acids Chip Based On Suffix Array