Font Size: a A A

Data Management And Integration For XML-Based Semi-Structured Data

Posted on:2003-11-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:P Y NieFull Text:PDF
GTID:1118360092966132Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years there has been an increased interest in managing data that does not conform to traditional data models,like the relational or object oriented data models. Therefore,semistructured data management and integration have recently become an important research topic in databases. Specifically,the extended Markup Language (XML) has emerged as a simple,practical standard to model and exchange semistructured data over the World Wide Web,without the rigid constraints of traditional database systems. Another excitement of XML's emergence is that it will turn the WWW into databases. So it is an important and significant way to managing and integrating semistructured data over the WWW while migrate semistructured data on the Web to XML. Now the data management and integration for XML-based semistructured data has become a hot research topic in the international database community.In this paper,we present an intensively study on the issues related to the data management and integration for XML-based semistructured data,including XML-based semistructured data models,query languages,schema formalisms and extraction,view maintenance,XML-based semistructured data storage as well as data integrations,etc.Our research project started from 1999,since research on XML-based semistructured data is relatively recent,and only a few of the topics have been addressed,the goal of which is to do some preliminaries for theoretical and foundation issues on XML-based semistructured data so as to lay a solid groundwork for the further development of practical systems. The main research works and specific contributions found in this thesis covers the following aspects:1. An appropriate data model for XML-based semistructured data is required before a DBMS can be built. In this paper we first develop a specification syntax for semistructured data,the evocation of Lisp's s-expression is deliberate. With this syntax,the description of semistructured data can be unified to a formal way. Second,we proposed an extensible XML-based semistructured data model for capturing and querying meta-data properties,and the mappings of XML data to this new data model is also presented. Also with this data model,the meta-data (properties) in a semistructured data and the attributes of element in XML data can be unified.2. A significant portion of this thesis is devoted to queries over XML-based semistructured data. We first present a number of desiderata for an XML-based query language,and based on this criterion,we introduce the syntax of a simple core Ian-guage for semistructured data and then describe four extensions that have resulted in working prototypes. Second,we present the algorithm for computing the result of a regular expression on data graph with cycles,the first-order interpretation of querying language for semistructured data,and explore structural recursion and bisimulation in semistructured data and propose an efficient and systematic way to computing a bisimulation between the two graphs. We also proposed and implemented a Web querying system with database features.3. Schemas for semistructured data differ,however,from those for relational or object-oriented data. Therefore,typing system and schema formalisms for XML-based semistructured data are extensively studied,and the algorithms both for computing type classification with maximal fixpoints semantics and maximal simulation are proposed,respectively. In this thesis,schema extraction from semistructured data and description of XML schema are also studied,and a algorithm for extracting Datalog rules from data as well as a formal description of OEM-based XML DTD schemas are proposed also.4. Views increase the flexibility of a DBMS by adapting the data to user or application needs. Defining views over XML-based semistructured data can be more complex than in traditional DBMSs. In this thesis we study view specification language suitable for XML-based semistructured data,and we investigate an incremental maintenance for materialized views over semistructured...
Keywords/Search Tags:XML-based Semistructured Data, XML Data Model, Query Language, Schema, View, Data Integration
PDF Full Text Request
Related items