Font Size: a A A

Study On XML Query Based On Path Summary

Posted on:2011-10-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:X S XuFull Text:PDF
GTID:1118330338488237Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of the technique of internet network, XML has become the de facto standard for information representation and exchange on the Word Wide Web. An XML document consists of semi-structure data, which do not satisfy any normal form in relational databases, so RDBMS cannot process it directly. How to store and query XML data based on relational database management system has become a hot topic recently. In this paper, after one of structural features in an XML document, path summary, is represented, path-partitioned encoding scheme is proposed. Many kinds of XML queries are studied based on the encoding scheme. DMXML, an prototype system on the platform DM5.0, is designed and implemented to obtain experimental evaluation.When path summary is extracted from an XML document, path-partitioned encoding scheme is proposed by combining structure mapping with model mapping. XML data are mapped to a set of relational tables by the scheme so that they can be stored into RDBMS successfully. The scheme is characterized by concise encoding and plain paths, supports for the mature disk-based B+ tree index, and makes a chance to update XML data in the future .According to path summary information, A solution that the expressions for path query are translated into SQL statements equivalently is provided. For a tree pattern belonged to subclass P{[],/}, after level distances between nodes is calculated in the tree pattern, they are equally changed into a sequence of LCA(Least Common Ancestor) relation after traveling the tree pattern by depth-first order. The sequence can generate a nested SQL statement by guiding formal rules. Especially, every simple linear path query can be uniquely corresponding to a SQL statement with an integer interval which comes from primary keys on RDBMS. The results of experiments indicate that path-partitioned encoding scheme speeds linear twig queries, and guarantees the I/O optimality.In order to improve the performance of XML query, the algorithm of structural join based on a path join graph is proposed. In the light of path summary, structure-constrained nodes are defined in the tree pattern, and their analytical path sets are calculated to eliminate descendent axes (//) and wildcards (*) and generate DM XML pattern sets. Every twig pattern in these sets are belonged to subclass P{[],/} without descendent axes or wildcards so that they can select elements from RDBMS to take part in structural join. Because the basic granularity is decomposed not by nodes in the tree pattern but by a simple linear path in analytical path sets, the algorithm of matching pattern can decrease structural joins, and improve the performance for twig queries.To handle twig queries with predicates such as AND,OR and NOT, a matching method is proposed. A usual approaches decomposing them into disjunctive normal formulas where not predicates only appear in leaf nodes, however, the number of disjunctive formulas is exponential to the size of the original query in the worst case. In this paper, a complex twig query is modeled as an XPattern extended from tree pattern with logic predicates. There are five types of nodes: data node (DNode), return node (RNode), logical-AND node (ANode), logical-OR node (ONode) and logical-NOT node (NNode) in the XPattern. Three operations such as simplification, normalization, and initialization are built on the XPattern. Simplification focuses on reducing size of XPatterns in order to guarantee I/O and CPU optimality in matching complex twig queries; normalization is committed itself to adjusting child order in parent nodes in the XPattern to assure correctness of matching twig queries; initialization deals with reading complete and sound elements likely participating in query answers from a relational database. After structure constraints on nodes are discussed under path summary, formalized method of answering twig queries with compound and nested predicates is described. Path cursors are identified with a couple of simple linear paths advances scanning elements organically during the matching procedure to decreases scanned elements and structural joins, and keep upward compatibility.DMXML, a prototype system based on DM5.0, is implemented. A series of experiments using XMark benchmark is designed to test DMXML's performance. The experiment result indicates path-partitioned encoding scheme and algorithms for XML queries effectively support XQuery processing. The whole performance of our prototype system is up to assigned requirements.In the paper, XML database based on path summary is studied, and some key technologies and novel methods are obtained to push XML database ahead. The above techniques have been implemented in DMXML, which provides a unified experiment platform for twig query research.
Keywords/Search Tags:XML, RDB, twig query, structural join, path summary, tree pattern
PDF Full Text Request
Related items