Font Size: a A A

Query processing and optimization for structural selection queries over XML data

Posted on:2006-11-14Degree:Ph.DType:Thesis
University:University of California, RiversideCandidate:Vagena, ZografoulaFull Text:PDF
GTID:2458390008458300Subject:Computer Science
Abstract/Summary:
Tree pattern matching is considered a core retrieval operation within XML query processing, as it enables selection of XML data based on their structural characteristics. Simple value-based schemes can effectively capture the structural relationships between XML data and convert structural constraints to value predicates. The first part of my thesis demonstrates that standard index structures (i.e. B+-trees) can be integrated with such schemes to achieve considerable performance gains while processing structure-aware queries, as they effectively provide early pruning of input data that contain no useful information for a given query. In particular, a family of novel index-based structural matching algorithms has been designed and implemented and their clear advantage over all previous relevant methods has been experimentally demonstrated. In subsequent work it is also demonstrated that processing methods based on similar principles can be utilized to handle generalized tree pattern queries with relaxed semantics.; The second part of my thesis considers the integration of all existing tree pattern query processing methods under a common optimization framework. New opportunities, derived from the tree structure of the data, have been explored and a new optimization strategy has been proposed, which avoids inherent problems of traditional solutions, namely large search spaces and heavy dependence on intermediate result size estimation. In particular, based on avoiding plans with unnecessarily large intermediate results, new holistic processing algorithms, which can exploit existing access methods and which present performance guarantees have been developed. As part of the optimization process, those holistic approaches are combined in a cost-based fashion. The holistic nature of the algorithms enables the definition of a cost model that (i) mitigates the propagation of estimation errors and (ii) enables global optimization while examining a small search space that explores only local decisions.; Subsequently, the problem of structural selection over XML data within an environment where different versions of the data can co-exist, is investigated. Within such environments users wish to retrieve portions of document versions at will. Path expression queries can effectively specify the portions of a particular version to be retrieved. Techniques that facilitate data version management and the ability to answer such queries over document versions are thus important. As part of this work novel storage schemes which integrate all document versions in a space-efficient manner, while enabling efficient structural selection have been designed and proposed.; While the work in the first three parts of the thesis targeted the unordered model of XML data, which is sufficient for the data-centric aspect of the language, order becomes of great importance when document centric applications are considered. Order-based queries include positional constraints and XPath navigation axes such as the following-sibling axis. The forth part of my thesis deals with the problem of efficient evaluation of such order-based queries. Novel holistic algorithms that evaluate those queries in a unified way, and avoid materialization of large intermediate results have been proposed. To the best of our knowledge, this is the first work that provides a complete, scalable, XML model-aware solution to the problem of supporting order within XML query processing.
Keywords/Search Tags:XML, Query processing, Queries, Selection, Optimization, Over, Work
Related items