Font Size: a A A

Based On The Serialized Efficient Xml Query Algorithm

Posted on:2009-12-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y K HuangFull Text:PDF
GTID:2208360272958964Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the adoption of XML in a wide range of applications, XML query processing has become an important research topic. Evaluating twig queries efficiently is at the core of structured query processing. To evaluate a twig query, most current approaches disassemble the query into multiple root-to-leaf simple paths; with the assistance of some indexing structures, these simple path queries are evaluated independently; finally, the results of these sub queries are joined to generate the final answer. However, joining intermediate results is an expensive operation. Indeed, it has become the most significant cost in evaluating twig queries.Eliminating join operations dramatically reduces the cost of query evaluation. To avoid expensive joins, sequence-based methods convert both XML documents and queries into sequences, and answer twig queries as a whole through subsequence matching. However, due to the incompatibility between structured data and sequential data, sequence-based methods are under optimized for performance, as their first priority is given to ensuring representation and query equivalences.In this paper, we remove the obstacle hindering the true potential of sequence-based methods. We introduce a novel sequence-based approach, LEO, for XML query processing. In this approach, we present a flexible XML sequencing strategy, this strategy enables us to choose a sequencing method that maximizes the sharing of sequences to maintain a minimal index structure. With the help of a super tree, we can enhance the range label method to get unified 3-tuple labels for document encoding. This is the requirement of repensentation equivalence and the fundation of sequencing flexibility. We also present a flexible query processing method. It performs subsequence matching based on the selectivity of the elements in the sequence. With the data statistics, our algorithm picks the optimal matching order to evaluate a query request. Thus, the sequenced-based approach introduced in this paper can fully and independently optimize both index size and query response time.We implement the algorithms in LEO and conduct extensive experiments. The study of extensive experiments verify the efficiency of this approach. By comparing with both sequence-based and non-sequence-based approaches, the results show that our approach gain the improvements on both the index size and the query response time, and our method has higher query efficiency and substantially lower storage cost than previous approaches.
Keywords/Search Tags:XML Query Processing, Index Technology, Sequence-based Approach
PDF Full Text Request
Related items