The design and implementation of an extended database system to support biological sequence similarity analysis

Posted on:1997-01-24

Degree:Ph.D

Type:Dissertation

University:University of Minnesota

Candidate:Shoop, Elizabeth Grace

Full Text:PDF

GTID:1468390014481146

Subject:Computer Science

Abstract/Summary:

Molecular biology researchers generate vast amounts of gene sequence data so quickly that they are outdistancing their ability to characterize what function they perform in the cell. A faster means of characterizing new sequences is to use similarity algorithms to compare them to known sequences. For large-scale sequencing projects, however, the biologists' problems using this technique are twofold: (1) they have too any sequences on which to manually execute similarity algorithms, and (2) the tremendous amount of textual data that results from running these algorithms is impossible to manually interpret. To solve these problems, we present the design and implementation of a Similarity Analysis Database System, which we developed during a cross-disciplinary research project between computer scientists and molecular biologists. The contributions of this work, to both computer science and computational biology research, are: (1) We have developed a DBMS-independent conceptual data schema for representing general information about the many different similarity algorithms, their execution parameters, and the results from performing those executions; (2) we have developed a processing system that automates the difficult task of performing similarity algorithm executions on the tens thousands of sequences generated annually by researchers on our project, and we provide the similarity results to the rest of the community via index search on our WWW site; (3) we have stored these similarity results in a database patterned after the conceptual schema, using an extensible DBMS; (4) we have extended the DBMS with additional functions that facilitate faster and more complex interpretation of similarities detected by the algorithms; (5) we show the value of these functions by reporting interesting results from several analyses that we have conducted on similarity data. Because the system is faster and easier to use, biologists are now able to overcome the insurmountable task of analyzing similarities for the large amounts of sequence data that they produce. We designed this system for long-term use by providing generality and giving biologists the ability to compare the results using different sets of criteria. The system thus empowers scientists to explore the similarity data in ways that were not possible before.

Keywords/Search Tags:

Data, Similarity, System, Sequence

Related items

1	Research On Transaction Sequence Data Mining
2	Research On Similarity Query Over Sequence Data
3	Sequence Recommendation Methods Based On Temporal Similarity Search
4	Research On Algorithm For Similarity Search Of Biological Sequence Database
5	Research Of Sequence Clustering Algorithm Based On Weighted Similarity
6	Sequence Similarity Based On Co-occurrence Word Frequence
7	Sequence and structure similarity search in biological and XML databases
8	Audio Similarity Model And Retrieval Based On Emotion
9	Research Of Data Mining On Dynamic Data
10	Research On Multi-modal Based Human Pose Recovery And Similarity Evaluation System