Font Size: a A A

Supporting on-the-fly data integration for bioinformatics

Posted on:2008-09-25Degree:Ph.DType:Dissertation
University:The Ohio State UniversityCandidate:Zhang, XuanFull Text:PDF
GTID:1448390005477910Subject:Biology
Abstract/Summary:
The use of computational tools and on-line data knowledgebases has changed the way the biologists conduct their research. The fusion of biology and information science is expected to continue. Data integration is one of the challenges faced by bioinformatics. In order to build an integration system for modern biological research, three problems have to be solved. A large number of existing data sources have to be incorporated and when new data sources are discovered, they should be utilized right away. The variety of the biological data formats and access methods have to be addressed. Finally, the system has to be able to understand the rich and often fuzzy semantic of biological data.; Motivated by the above challenges, a system and a set of tools have been implemented to support on-the-fly integration of biological data. Metadata about the underlying data sources are the backbone of the system. Data mining tools have been developed to help users to write the descriptors semi-automatically. With automatic code generation approach, we have developed several tools for bioinformatics integration needs. An automatic data wrapper generation tool is able to transform data between heterogeneous data sources. Another code generation system can create programs to answer projection, selection, cross product and join queries from at file data.; Real bioinformatics requests have been used to test our system and tools. These case studies show that our approach can reduce the human efforts involved in an information integration system. Specifically, it makes the following contributions. (1) Data mining tools allow new data sources to be understood with ease and integrated to the system on-the-fly. (2) Changes in data format are localized by using the metadata descriptors. System maintenance cost is low. (3) Users interact with our system through high-level declarative interfaces. Programming efforts are reduced. (4) Our tools process data directly from at files and requires no database support. Data parsing and processing are done implicitly. (5) Request analysis and request execution are separated and our tools can be used in a data grid environment.
Keywords/Search Tags:Data, Tools, Integration, System, On-the-fly, Bioinformatics
Related items