Font Size: a A A

Study On Techniques Of Searching For Approximate Repeats In DNA Sequences Based On Hamming Distance

Posted on:2009-11-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhaoFull Text:PDF
GTID:2120360308979756Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the start of Human Genome Project and the rapid increase of biological data, bioinformatics is gradually becoming one of the most important research fields, which studies the biological systems by applying mathmatics, computer science and information science. In the broad research areas of bioinformatics, repeats searching problem is an important and basic DNA sequence analysis problem, of which approximate repeats searching is an important issue which many researchers have paid great attention to, since there is great biological significance in approximate repeats and the searching problem itself is a new and complicated one.This thesis focuses on the searching problem of two kinds of important approximate repeats, which are approximate tandem repeats and approximate inverted repeats. Based on the proposed definitions of the two kinds of repeats, two indexing structures and relative searching algorithms are designed respectively.For the problem of searching for approximate tandem repeats, firstly pattern-similarity and neighbor-similarity are proposed based on hamming distance for similarity measurement, then a new definition Largest Neighbor-similarity-based Approximate Tandem Repeats (LNATR) is presented. After that a new indexing structure named Pattern Unit Array (PUA) is designed, based on which an effective LNATR searching algorithm is proposed, and is compared with another approximate tandem repeats searching algorithm designed by Gad M. Landau.For the problem of searching for approximate inverted repeats, the thesis first presents matching-degree based on hamming distance to measure the similarity between the two patterns of inverted repeats, based on which a new definition Largest Matching-degree-based Approximate Inverted Repeats (LMAIR) is presented. Then Boundary Index (BI) is designed for further LMAIR searching. Finally, simple LMAIR searching algorithm and optimized LMAIR searching algorithm are proposed based on BI, and comparation is made between the two LMAIR searching algorithms.
Keywords/Search Tags:DNA sequence, approximate tandem repeats, approximate inverted repeats, Pattern Unit Array, Boundary Index
PDF Full Text Request
Related items