| In Bioinformatics, interdisciplinary researchers study how to capture, manage, analyze, and disseminate biological information in the emerging drug discovery and disease management paradigms. Bioinformatics presents computer scientists with unparalleled challenges and opportunities. Current researchers and developers in bioinformatics face these challenges: (1) extracting and integrating biological data; (2) representing, managing, and reasoning about biological data; (3) integrating biological knowledge management and biological discovery process; (4) studying large-scale biology in an interdisciplinary environment; and ultimately (5) enabling the discovery of encyclopedic biological knowledge.; In this thesis, I describe the design of a bioinformatics discovery-oriented computing framework based on the extended relational database management systems (E-RDBMS) architecture. The framework supports an integrated biological discovery process that combines both the “data-driven” view and the “hypothesis-driven” view. It consists of two primary research components, genomic data modeling and complex query modeling, and one extension component, the Similar_Join operator. In genomic data modeling, I have enabled biological data analysts to represent biological sequence data by extending data modeling techniques from content-neutral domains to the genomics domain. In complex query modeling, I have enabled biological data analysts to accomplish complex database querying by inventing new notation and techniques. In the Similar_Join operator design, I have enabled biology users to shift the computational burdens of a routine bioinformatics task—batch sequence similarity searches—back to the computing system by extending the general-purpose database management system into the biology domain.; Using the framework, I have accomplished real world bioinformatics research projects, which I present as three bioinformatics case studies. Joining three bioinformatics development teams, I have enabled high-throughput and high-performance (1) annotation of anonymous plant ESTs, (2) comparison of gene expression detection algorithms, and (3) sequence selection for three Affymetrix commercial GeneChip microarrays.; Overall, I have contributed to both computer science and bioinformatics: I have extended existing database system research into the emerging field of bioinformatics; meanwhile, I have enabled interdisciplinary teams of biologists and computational scientists to perform “large-scale integrated biology”. I believe the framework will inspire the development of new bioinformatics platforms, which will lead to the ultimate discovery of the Holy Grail in biology. |