Font Size: a A A

Research On Efficient Processing Of XML Twing Pattern Queries

Posted on:2010-06-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:J H JiangFull Text:PDF
GTID:1118360302458565Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the widely used in E-Commerce, digital library and Web services, etc, XML has been the de-facto standard for data representation and exchange. The number of XML documents around the world is growing at an alarming rate. Meanwhile, the management and retrieval of the growing XML data has become an important research area and observed wide attention from researches.This thesis studies the query processing of XML twig pattern queries, which is a core subset of XML query languages. Matching a twig pattern query means finding all the occurrences specified by a selection predicate on multiple elements of the twig pattern tree embedded in the XML data tree. Efficient processing of XML twig pattern queries is a core operation of XML query processing. Due to the semi-structure of XML, the complexity and diversity of XML twig pattern queries, traditional database technologies can not solve the problem of XML twig pattern queries processing efficiently, and it still exists many technique challeges in XML index, query optimization and efficient twig pattern matching algorithms. Thus, efficent processing of XML twig pattern queries is still an import and challenging research topic.To tackle the major issues in efficient processing of XML twig pattern queries, in this paper, we focus on the following key technologies: XML index, minimization of twig pattern queries, selectivity estimation for XML twigs, the cost estimation model, efficient matching algorithms for different types of XML twig pattern queries, and so on. Then, on the basis of them, a novel and effective universal framework for XML twig pattern queries processing is proposed. The main contributions and innovations of this paper are summarized as follows: Universal model of XML twig pattern queries processing frameworkWe first analyze the semi-structure of XML data model as well as the structural characteristics of XML twig pattern queries, and then a novel unified XML twig pattern queries processing framework is proposed. Based on the multi-level index management model, by integrating a variety of XML twig pattern matching algorithms and choosing the algorithm according to a series of heuristic optimization rules, this framework provides unified and efficient XML twig pattern queries processing services. Then, within the framework, we present our research on the following XML twig pattern queries processing problems: 1) holistic path-join algorithm for simple XML twig pattern queries; 2) efficient processing of complex XML twig pattern queries, and 3) index-based algorithms to skip useless elements:Research on efficient processing of XML twig pattern queries based on path-joinBased on the study on the decomposition granularity of the twig join algorithms, we propose a novel holistic path-join algorithm, named TJFGeneric. We divide the twig pattern into individual paths, and process the simple twig pattern queries in a path-join way from bottom to up. Thus, the number of the join operations is reduced sharply, and labels of the only leaf query nodes are accessed. Then, we study the labeling scheme of the path join algorithm. In order to support a wide range of labeling schemes, standardized interfaces are used to impove the universality of the algorithm.Research on efficient processing of Ordered XML twig pattern queriesWe propose a new path join algorithm, named OTJFast, to process Ordered XML twig pattern queries efficiently. In this problem, we design a new data structure, named Children Linked Stacks (CLS). By storing and checking the partial results in CLS during the path join processing, the orders of the elements are judged, and the queries can be processed in a holistc way. The experimental results show that our approach is both efficient and effective, in terms of the size of I/O accessed and executing time.Research on efficient processing of AND/OR-twig pattern queriesWe first introduce a new concept, named AND/OR Branching Extension (AOBE), to define the problem of matching an AND/OR-twig pattern query concisely. Then, based on AOBE, a new holistic path join algorithm, called ORTJFast, is proposed to process AND/OR-twig queries efficiently, which matches the subtrees rooted with branching nodes recursivly from bottom to up. Extensive experimental results show that ORTJFast has better performance than previous algorithms.Research on index-based algorithms with optimization rulesBased on above algorithms, we present three new index-based algorithms named TJFGeneric~+, OTJFaster, ORTJFast~+, by incorporating several effective optimization rules. These work well on available indices (such as B~+-tree), skipping useless elements. Thus, not only is disk access reduced greatly, but also many unnecessary computations are avoided. The experimental results prove that the optimized algorithms perform significantly better than existing proposals.Overall, in order to provide practical solutions for large-scale application of XML query processing, this paper presents a series of new efficient holistic path join algorithms for different types of twig pattern queries, and on the basis of them, a new unified and efficient framework for XML twig pattern queries processing is proposed.
Keywords/Search Tags:XML, twig pattern queries, holistic twig join, path join, query optimization
PDF Full Text Request
Related items