Font Size: a A A

Indexing and path query processing for XML data

Posted on:2005-01-16Degree:Ph.DType:Dissertation
University:The University of ArizonaCandidate:Li, QuanzhongFull Text:PDF
GTID:1458390008986048Subject:Computer Science
Abstract/Summary:
XML has emerged as a new standard for information representation and exchange on the Internet. To efficiently process XML data, we propose the extended preorder numbering scheme, which determines the ancestor-descendant relationship between nodes in the hierarchy of XML data in constant time, and adapts to the dynamics of XML data by allocating extra space.; Based on this numbering scheme, we propose sort-merge based algorithms, EA -Join and EE -Join, to process ancestor-descendant path expressions. The experimental results showed an order of magnitude performance improvement over conventional methods. We further propose the partition-based algorithms, which can be chosen by a query optimizer according to the characteristics of the input data.; For complex path expressions with branches, we propose the Containment B+-tree (CB-tree) index and the IndexTwig algorithm. The CB-tree, which is an extension of the B+-tree, supports both the containment query and the reverse containment query. It is an effective indexing scheme for XML documents with or without a small number of recursions. The proposed IndexTwig algorithm works with any index supporting containment and reverse containment queries, such as the CB-tree. We also introduce a simplified output model, which outputs only the necessary result of a path expression. The output model enables the Fast Existence Test (FET) optimization to skip unnecessary data and avoid generating unwanted results.; Also in this dissertation, we introduce techniques to process the predicates in XML path expressions using the EVR-tree. The EVR-tree combines the advantages of indexing on values or elements individually using B+-trees. It utilizes the high value selectivity and/or high structural selectivity, and provides ordered element access by using a priority queue.; At the end of the dissertation, we introduce the XISS/R system, which is an implementation of the XML Indexing and Storage System (XISS) on top of a relational database. The XISS/R includes a web-based user interface and a XPath query engine to translate XPath queries into efficient SQL statements.
Keywords/Search Tags:XML, Query, Path, Process, Indexing
Related items