Font Size: a A A

Analysis of data partitioning on correlated data to genetic sequence searches using string matching algorithms

Posted on:2007-01-12Degree:M.SType:Thesis
University:The University of Alabama in HuntsvilleCandidate:Nance, David RFull Text:PDF
GTID:2448390005975355Subject:Computer Science
Abstract/Summary:
To create a distributed approach to genetic database sequence searches requires partitioning the data into multiple sections. However, the nature of the data leaves the possibility of cutting the queried sequence into unrecognizable pieces. Adding overlap for each partition which is less than or equal to half the length of the query sequence corrects this problem. This was demonstrated in this thesis using English texts. English texts were first correlated with genetic data, partitioned into various groupings of sizes, and overlap applied in incremental steps to the smallest partition size. Knuth-Morris-Pratt and Boyer-Moore string search algorithms were used to locate a small query sequence that was cut during partitioning and resolved by the use of overlap.
Keywords/Search Tags:Sequence, Partitioning, Genetic
Related items