Font Size: a A A

Semi-structured Data Integration System, Query Processing Research

Posted on:2005-02-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:C TaoFull Text:PDF
GTID:1118360125467530Subject:Computer software
Abstract/Summary:PDF Full Text Request
Due to high-speed development of Internet and electronic commerce, the volumes and types of information an enterprise can access increased greatly in recent years. The ever-exploding amount of data on the web also proposed new challenges to information access. Research on data integration was prospering under these requirements.Data integration is the problem of combining data residing at different sources, and providing users with a unified view of these data. Researchers have gained a lot in this field in these years, but it is such a rich area the there are still a lot of problems waiting for solutions, especially when more and more new techniques joined the area. Data integration keeps a hot research topic in this decade. Because users usually access data via queries, and data integration systems often describe data sources as views on the global schema, query processing becomes one of the core problems of data integration. And, data on the Web also leads to research on semi-structured data.In this paper, we focus on query processing in the semistructured data integration systems. We did research on two problems in this paper. The first one is maximal TSL(Tree Specification Language) query rewriting problem in OEM(Object Exchange Model) semi-structured data integration systems. The second one is maximal query execution plan generation and optimization problems in ontology-based XML integration systems.On the first aspect, we first formalized such concepts as query containment, query equivalence and maximal contained query rewriting in TSL-based OEM semi-structured data integration systems; under this framework, we proposed semi-structured query rewriting algorithm based on TSL, which borrows idea from MiniCon, a scalable algorithm for relational query rewriting; at last we proved the correctness of the algorithm theoretically. On the second aspect, we first formalized ontology-based XML data integrationsystems; under the formalization framework, we proposed maximal query execution plan generation algorithm; we introduced the concept of incomplete roles and put forward optimized algorithm based on incomplete roles; and then we proposed network cost optimization algorithm for query execution plans. We also proved correctness of the algorithms.The research work in this thesis is based on a national science foundation project, Key Techniques of Digital Libraries;In this project; I did work on design and implementation of interoperability interface and query processing; The problems researched in this paper is based on the work...
Keywords/Search Tags:Semi-structured
PDF Full Text Request
Related items