Font Size: a A A

XML query processing and optimization

Posted on:2004-09-02Degree:Ph.DType:Thesis
University:University of Waterloo (Canada)Candidate:Zhang, HuiFull Text:PDF
GTID:2468390011462578Subject:Computer Science
Abstract/Summary:
As XML becomes more widespread as a standard representation for data, XML-based query languages and their evaluations are increasingly important. For this purpose, several XML based query languages have been proposed, including W3C's XQuery. However, query processors for XML data have only recently begun to be developed, with little work on query optimizations. Certainly, there are many alternative ways to process and optimize XML queries. In common with other researchers, we wish to capitalize on the extensive work invested in relational database technology. In particular, we take an algebraic approach with the expectation that it fits in the traditional relational framework for query processing and query optimization.; In this thesis, we define a query canonical form which provides a conceptually uniform vision of path expressions, element constructors and FLWR expressions in XQuery. The power of this canonical form is shown by identifying an important subset of XQuery that can be translated to this canonical form. Moreover, this canonical form nicely separates different aspects of an XML query, i.e., structure, navigation, and condition. This property makes it easy to be extended, and a possible extension of the canonical form is presented.; Having this canonical form, we present an algorithm to translate from it into an extended relational algebra that includes operators defined for the structured text datatype, and we prove its correctness. This algorithm can be used as the basis of a sound translation from XQuery to SQL, and the starting point for query optimization, which is required for XML to be supported by relational database technology.; Given an algebraic expression tree resulting from the translation algorithm, we address the query rewriting problems in the face of new operators. In addition to reusing and adapting relational optimization technologies, we develop query rewriting techniques that utilize structural information and develop a new set of algebraic rewriting rules. We demonstrate the potential optimization gained by applying these techniques.; Finally, we study how to query relational storage wrapped with XML views in our query processing and optimization framework, with a focus on how to rewrite an XQuery expression posed on XML views to an equivalent SQL query that is formulated against relational storage directly. This is achieved by utilizing characteristics of relational storage and the independence of relational algebra and physical algebra.
Keywords/Search Tags:XML, Query, Relational, Optimization, Canonical form
Related items