Supporting on-the-fly data integration for bioinformatics

Posted on:2008-09-25

Degree:Ph.D

Type:Dissertation

University:The Ohio State University

Candidate:Zhang, Xuan

Full Text:PDF

GTID:1448390005477910

Subject:Biology

Abstract/Summary:

The use of computational tools and on-line data knowledgebases has changed the way the biologists conduct their research. The fusion of biology and information science is expected to continue. Data integration is one of the challenges faced by bioinformatics. In order to build an integration system for modern biological research, three problems have to be solved. A large number of existing data sources have to be incorporated and when new data sources are discovered, they should be utilized right away. The variety of the biological data formats and access methods have to be addressed. Finally, the system has to be able to understand the rich and often fuzzy semantic of biological data.; Motivated by the above challenges, a system and a set of tools have been implemented to support on-the-fly integration of biological data. Metadata about the underlying data sources are the backbone of the system. Data mining tools have been developed to help users to write the descriptors semi-automatically. With automatic code generation approach, we have developed several tools for bioinformatics integration needs. An automatic data wrapper generation tool is able to transform data between heterogeneous data sources. Another code generation system can create programs to answer projection, selection, cross product and join queries from at file data.; Real bioinformatics requests have been used to test our system and tools. These case studies show that our approach can reduce the human efforts involved in an information integration system. Specifically, it makes the following contributions. (1) Data mining tools allow new data sources to be understood with ease and integrated to the system on-the-fly. (2) Changes in data format are localized by using the metadata descriptors. System maintenance cost is low. (3) Users interact with our system through high-level declarative interfaces. Programming efforts are reduced. (4) Our tools process data directly from at files and requires no database support. Data parsing and processing are done implicitly. (5) Request analysis and request execution are separated and our tools can be used in a data grid environment.

Keywords/Search Tags:

Data, Tools, Integration, System, On-the-fly, Bioinformatics

Related items

1	Information integration in a grid environment Applications in the bioinformatics domain
2	Research On Data Integration Techniques In The Context Of The Tools Integration
3	Study On Heterogeneous Bioinformatics Database Integration
4	Data-mart integration of the proteome
5	Enhanced bioinformatics data modeling concepts and their use in querying and integration
6	Researches On The Heterogeneous Data Integration System And Its Supporting Tools Prototype
7	Application Research On Data Integration In Bioinformatics
8	Design And Implementation Of Testing Tools Integration System Besed On RCP
9	A Study For Bioinformatics Application System
10	Research On The Key Technology Of Metadata-based Integration For Proteomics Data Resources And The Development Of The Application Platform