Font Size: a A A

Query Processing And Optimization In Heterogeneous Information Integration

Posted on:2005-01-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:R X LiFull Text:PDF
GTID:1118360152469127Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The development of computer and network technologies speeds up in recent years.But the data, which are the core of all applications, are still stored in different systems withdifferent manners and live by themselves in distributed and heterogeneous environment.With the steady increase of application requirements, more and more people want to accessand manipulate the useful information among multiple massive information sources andachieve the interoperability of multiple computer systems and different information sources.However, these data sources may not only geographically locate at multiple autonomousdomains in heterogeneous environment with different data formats, storage modes andaccess control policies, but also logically differ from each other in data models,manipulation languages and data semantics. Moreover, the sharing ability, modes andcontents of the sources may change at any time. So, designing a heterogeneous informationintegration system (HIIS) supporting the common data model and a uniform querylanguage is a better way to implement this type of interoperation. HIIS can hide most of thedifferences of access methods and user interfaces of multiple heterogeneous datamanagement systems. It also provides an information interoperating platform as a commoninterface to access multiple heterogeneous data sources and combine the intermediate queryresults from these sources. Data interoperability is one of the main problems in heterogeneous informationintegration. There are two approaches to solve the problem for integration andinteroperation of multiple data sources in distributed and heterogeneous environment,federated database system and multidatabase system. They both have advantages anddisadvantages. The dissertation presents a multi-domain-based hierarchy interoperation(MDHI) model through merging these two approaches. The framework based on MDHImodel not only fulfills the efficiency requirements of the information integration andprocessing in local areas, but also provides a method for integrating multiple heterogeneousdata sources in wide environment, which meets the real world application requirementsmuch better. The local schemas for local data sources in HIIS are different and the dissertationpresents an XML-based integration data model (XIDM) as the common data model tointegrate these different schemas. The XIDM model describes the export and globalschemas as the graph structure, which can integrate the data of multiple heterogeneoussystems, such as database systems, file systems and web information systems. The global IIImappings between the global schemas and export schemas and the local mappings betweenexport schemas and local schemas are also given. These mappings solve the problem oftransformation from XIDM model to relational data model, object-oriented model andHTML/XML document model, or vice versa. The examples demonstrate the effect andefficiency of the XIDM model and the schema mapping approach. Query processing is one of the key techniques in HIIS, and query decomposition,scheduling and optimization are the central problems for query processing. The dissertationfirstly defines the basic concepts of query processing and gives the architecture for it inHIIS. After analyzing the characteristics and requirements of the XML query, we chooseXQuery as the query language for XIDM model. Based on the above discussion, the basicprinciples and algorithm of global query decomposition are given, and the semanticequivalence of the algorithm is also discussed. Post-query processing is the process of scheduling the query execution plan andcombining the intermediate results according to the post-processing operations. Theseoperations are composed of the inter-site operations related to the global queries. Thedissertation extends the operations of relational algebra to define the XIDM-orientedpath-based operations of element clusters, called XIDM r...
Keywords/Search Tags:Heterogeneous information integration system, Integration data model, Schema mapping, Query processing, Query decomposition, Query scheduling, Query optimization, Multiple autonomous domain
PDF Full Text Request
Related items