Font Size: a A A

An efficient method for searching compressed genomic databases

Posted on:2009-04-19Degree:M.S.C.SType:Thesis
University:University of Nevada, RenoCandidate:Wallace, Jeffrey BFull Text:PDF
GTID:2448390002997439Subject:Biology
Abstract/Summary:
Searching for similarities between DNA sequences is a fundamental task for genomics researchers. This task is becoming increasingly difficult in the face of dramatic and sustained growth in the amount of genomic data being generated. In 2005, the genomic databases at the National Center for Biotechnology Information (NCBI) received approximately 50 million web hits per day, at peak rates of about 1,900 hits per second. As these databases become more popular, there is increased demand to make them faster and more efficient. This thesis proposes a method for compressing and searching selected genome databases using techniques appropriate for computers of virtually any size.;This search technique is expected to produce its best results with large search sequences against large DNA databases, and lends itself to parallel computation techniques with little communication overhead required. Because the compression algorithm uses a lossless binary encoding format, search results are exact---not approximate. Furthermore, searches take place on the compressed data, obviating the need for decompression prior to executing a search.;This thesis provides background information on existing tools for sequence alignment, presents details of our proposed method, evaluates the performance of our algorithm by comparing it with WND-BLAST, and outlines directions of future work.
Keywords/Search Tags:Search, Method, Genomic, Databases
Related items