An efficient method for searching compressed genomic databases

Posted on:2009-04-19

Degree:M.S.C.S

Type:Thesis

University:University of Nevada, Reno

Candidate:Wallace, Jeffrey B

Full Text:PDF

GTID:2448390002997439

Subject:Biology

Abstract/Summary:

Searching for similarities between DNA sequences is a fundamental task for genomics researchers. This task is becoming increasingly difficult in the face of dramatic and sustained growth in the amount of genomic data being generated. In 2005, the genomic databases at the National Center for Biotechnology Information (NCBI) received approximately 50 million web hits per day, at peak rates of about 1,900 hits per second. As these databases become more popular, there is increased demand to make them faster and more efficient. This thesis proposes a method for compressing and searching selected genome databases using techniques appropriate for computers of virtually any size.;This search technique is expected to produce its best results with large search sequences against large DNA databases, and lends itself to parallel computation techniques with little communication overhead required. Because the compression algorithm uses a lossless binary encoding format, search results are exact---not approximate. Furthermore, searches take place on the compressed data, obviating the need for decompression prior to executing a search.;This thesis provides background information on existing tools for sequence alignment, presents details of our proposed method, evaluates the performance of our algorithm by comparing it with WND-BLAST, and outlines directions of future work.

Keywords/Search Tags:

Search, Method, Genomic, Databases

Related items

1	Identifying functional lox sequences: A genomic search and randomized libraries
2	Semantics Based Top-k Keyword Search Technology In Relational Databases
3	Research On Some Key Technologies Of Computer Visualization Of Genomic Information
4	The Key Techniques Of Deep Web Search Engine
5	Biological applications specific integrated circuits for genomic analysis
6	Unstructured search on structured databases
7	Genomic data mining enhanced by symbolic manipulation of Boolean functions
8	Enabling Third Party Services Over Deep Web Databases and Location Based Service
9	Technologies for keyword search in databases
10	Research On Graph Similarity Search On Uncertain Graph Databases