Font Size: a A A

Technology Research, Multi-dtd-based Xml Query

Posted on:2004-09-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y LuFull Text:PDF
GTID:1118360095962830Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
XML (extensible Markup Language) has become extremely popular as a flexible medium of data exchange and storage on the WWW. The development of XML brings on more requirements for performance of XML query. This paper focuses on XML queries that are based on multiple DTDs in certain domain and aims at solving following problems: 1) How to construct an XML query without knowing the detailed structure of XML documents to be searched? 2) How to execute a query on XML documents that are based on multiple DTDs in certain domain? 3) How to execute a query on XML documents that have similar structures to the query?Main achievements of the research include:1. In order to retrieve the results to path queries efficiently, people pay much attention to XML indexing. This paper proposes DBXI (DTD-based XML Indexing), a XML indexing method that utilizes the DTD information to speed up the evaluation of XML path query. Main characteristics of DBXI are: 1) A new XML numbering scheme is adopted which enables each element in XML documents carries corresponding DTD structural information. 2) A path query with N elements (or attributes) and a predicate restriction can be implemented with only 0 or 2 structural join operations per XML document while at least N-1 times of structural join operations are needed in XISS (XISS is a kin XML indexing system developed by Q.Li and B. Moon). 3) For a path expression that is not complying with any paths in XML documents, DBXI can give a judgment of no answer in much shorter time than that of SphinX, XISS, etc. Experimental results demonstrate that DBXI can process path expressions much faster than Lore SphinX and XISS do.2. Techniques of XML queries that are based on multiple DTDs in certain domain, such as "Finding of candidate DTDs", "Ranking of candidate DTDs against user's query structure", etc, are discussed in detail. A new idea of using sub-distance between trees instead of distance between trees when ranking is done is proposed. The new idea guarantees a more precise ranking output. Ranking algorithm that is close to linear time complexity is offered and main algorithms are proved.3. A prototype of XML queries based on multiple DTDs in certain domain, naming Smart XML Query, is offered. Smart XML Query can not only fulfill queries whose structures match the documents to be searched exactly, but also queries whose structures are similar to documents to be searched.4. This paper proposes a new method of storing XML data in relational DB, which makes use of information of DTD. Compared to methods proposed by J.Shanmugasundaram Yan Men-hin, etc, which utilize DTD similarly, the new storage method is of the following characteristics: 1) Structural change in XML documents will not lead to any schema change of relative relational tables. 2) XML documents based on different DTDs can be kept in a same relational table. All DTDs are keeped in a same table and so do XML documents. 3) Reconstruction of XML documents can be done with linear time complexity.
Keywords/Search Tags:XML, Query, DTD, Distance, Path expression, Numbering scheme, Indexing
PDF Full Text Request
Related items