Font Size: a A A

Sequence and structure similarity search in biological and XML databases

Posted on:2006-12-31Degree:Ph.DType:Dissertation
University:University of California, Santa BarbaraCandidate:Aghili, S. AlirezaFull Text:PDF
GTID:1458390008954093Subject:Computer Science
Abstract/Summary:
The unprecedented growth of the Internet and biological databases has introduced challenging and complex data formats and hence furnishing unique collaborative venues for scientists of various disciplines. The set of such complex databases includes, (1) XML (eXtended Markup Language) databases, (2) DNA and Protein sequence and structure databases, (3) Microarray gene expressions, (4) Biomedical images, and (5) Sensor data stream and Time series databases. Given a source query pattern and a target database, the similarity search (range query or top-k) seeks to identify those records of the database which match the given query. The problem of similarity search in biological and textual databases has received substantial attention in the past decade. Numerous filtration and indexing techniques have been proposed to address the scalability issues and reduce the curse of dimensionality. However, complex applications demand special customization based on the inherent and underlying dynamics of the data. In this work, we study the integration of various transformation and shape summarization techniques on biological sequence and protein structure data, as well as path encoding in the tree-structured XML data, for more efficient similarity search query processing.
Keywords/Search Tags:Data, Similarity search, XML, Biological, Structure, Sequence, Query
Related items