Semi-structured Data Integration System, Query Processing Research

Posted on:2005-02-19

Degree:Doctor

Type:Dissertation

Country:China

Candidate:C Tao

Full Text:PDF

GTID:1118360125467530

Subject:Computer software

Abstract/Summary:

PDF Full Text Request

Due to high-speed development of Internet and electronic commerce, the volumes and types of information an enterprise can access increased greatly in recent years. The ever-exploding amount of data on the web also proposed new challenges to information access. Research on data integration was prospering under these requirements.Data integration is the problem of combining data residing at different sources, and providing users with a unified view of these data. Researchers have gained a lot in this field in these years, but it is such a rich area the there are still a lot of problems waiting for solutions, especially when more and more new techniques joined the area. Data integration keeps a hot research topic in this decade. Because users usually access data via queries, and data integration systems often describe data sources as views on the global schema, query processing becomes one of the core problems of data integration. And, data on the Web also leads to research on semi-structured data.In this paper, we focus on query processing in the semistructured data integration systems. We did research on two problems in this paper. The first one is maximal TSL(Tree Specification Language) query rewriting problem in OEM(Object Exchange Model) semi-structured data integration systems. The second one is maximal query execution plan generation and optimization problems in ontology-based XML integration systems.On the first aspect, we first formalized such concepts as query containment, query equivalence and maximal contained query rewriting in TSL-based OEM semi-structured data integration systems; under this framework, we proposed semi-structured query rewriting algorithm based on TSL, which borrows idea from MiniCon, a scalable algorithm for relational query rewriting; at last we proved the correctness of the algorithm theoretically. On the second aspect, we first formalized ontology-based XML data integrationsystems; under the formalization framework, we proposed maximal query execution plan generation algorithm; we introduced the concept of incomplete roles and put forward optimized algorithm based on incomplete roles; and then we proposed network cost optimization algorithm for query execution plans. We also proved correctness of the algorithms.The research work in this thesis is based on a national science foundation project, Key Techniques of Digital Libraries;In this project; I did work on design and implementation of interoperability interface and query processing; The problems researched in this paper is based on the work...

Keywords/Search Tags:

Semi-structured

PDF Full Text Request

Related items

1	Research On Topic-oriented Semi-structured Data Integration Methods
2	Semi-structured Data Integration System, Query Processing Research
3	Research And Application Of Conversion Between XML Which Is A Kind Of Semi-Structured Data And Structured DB
4	Information Extraction For Semi-structured Chinese Resume
5	Identification Of The Semi-Structured Text
6	Semantic Based Information Retrieval From Semi-structured Documents
7	Modeling Research Based On 2.5D And Characterization For Semi-Structured Objects
8	Curriculum Vitae Recognition System Base On Identification Of Semi-Structured Text
9	Research And Application Of Semi-structured Data Extraction
10	Research On Integrated Technology Of Semi-structured Data