Font Size: a A A

The design and implementation of an extended database system to support biological sequence similarity analysis

Posted on:1997-01-24Degree:Ph.DType:Dissertation
University:University of MinnesotaCandidate:Shoop, Elizabeth GraceFull Text:PDF
GTID:1468390014481146Subject:Computer Science
Abstract/Summary:
Molecular biology researchers generate vast amounts of gene sequence data so quickly that they are outdistancing their ability to characterize what function they perform in the cell. A faster means of characterizing new sequences is to use similarity algorithms to compare them to known sequences. For large-scale sequencing projects, however, the biologists' problems using this technique are twofold: (1) they have too any sequences on which to manually execute similarity algorithms, and (2) the tremendous amount of textual data that results from running these algorithms is impossible to manually interpret. To solve these problems, we present the design and implementation of a Similarity Analysis Database System, which we developed during a cross-disciplinary research project between computer scientists and molecular biologists. The contributions of this work, to both computer science and computational biology research, are: (1) We have developed a DBMS-independent conceptual data schema for representing general information about the many different similarity algorithms, their execution parameters, and the results from performing those executions; (2) we have developed a processing system that automates the difficult task of performing similarity algorithm executions on the tens thousands of sequences generated annually by researchers on our project, and we provide the similarity results to the rest of the community via index search on our WWW site; (3) we have stored these similarity results in a database patterned after the conceptual schema, using an extensible DBMS; (4) we have extended the DBMS with additional functions that facilitate faster and more complex interpretation of similarities detected by the algorithms; (5) we show the value of these functions by reporting interesting results from several analyses that we have conducted on similarity data. Because the system is faster and easier to use, biologists are now able to overcome the insurmountable task of analyzing similarities for the large amounts of sequence data that they produce. We designed this system for long-term use by providing generality and giving biologists the ability to compare the results using different sets of criteria. The system thus empowers scientists to explore the similarity data in ways that were not possible before.
Keywords/Search Tags:Data, Similarity, System, Sequence
Related items