Font Size: a A A

An XML Twig Pattern Matching Approach Based On Dewey Encoding

Posted on:2009-09-10Degree:MasterType:Thesis
Country:ChinaCandidate:L G FuFull Text:PDF
GTID:2178360272463523Subject:Computer applications
Abstract/Summary:PDF Full Text Request
Over years with the development of the network and database technologies, XML(eXtensible Markup Language) has already become the standards of the expression, integration and exchange of data on the Web. Compared with other markup languages XML has simple form and strong self-describing ability and realizes the separation of content, structure and expression, and is more adapt to the expression and exchange of data. Therefore, XML is used widely in various fields of research and electric business. Nowadays there is much XML data on the web. How to effectively query the XML document has become an important research topic.To query XML documents, the algorithms which are given at home and abroad nowadays are mainly based on the two ideas below: The first is to decompose the query expression to several simple paths, then match the simple paths in the documents and finally join the intermediate results. The method inevitably produces many useless intermediate results. The other is the twig pattern matching approach, which expresses the query expression as a pattern tree that is called "twig pattern", then find its match in XML document tree. All output results can be got through scanning the input node lists only once. The method reduces the algorithm's I/O and CPU complexity and space complexity and improves the efficient of query. Nowadays the typical twig pattern matching approaches are TwigStack and TJfast. However, these algorithms are not optimal when the query expressions contain more P-C relationships, and the common point of the algorithms is that they do not use the path index of XML documents.For improving the efficient of the twig pattern matching approach more, in the paper we give a novel twig pattern matching approach TwigJoin based on Dewey encoding with the path index of XML document JoinGuide. The main works are summarized as follows:(1) Combining the advantages of the region encoding and the Dewey encoding, we encode the instance nodes in XML documents with Dewey encode scheme and encode the index node in the JoinGuide with region encode scheme.(2) Improves the TwigStack algorithm, at first we find all the matching index node in the index JoinGuide , then get the results through scanning the leaf node lists in the twig pattern.(3) Theoretically compares TwigJoin with TwigStack and I-XISS and analyses them, then shows the advantages of TwigJoin.(4) Finally constructs an experimental system TwigJoin and implements the twig pattern matching approach, then compares the TwigJoin with TwigStack and I-XISS, and the results show that the efficient of our approach is improved greatly.The future work is: the first is to improve the efficient of parsing XML documents and setting up the path index of the TwigJoin system, then improve the query algorithm and enforce the ability for querying all kinds of axis to enable it adapt to more complicated query expressions.
Keywords/Search Tags:XML, Encoding, JoinGuide Index, Twig Pattern Matching, Structure Join
PDF Full Text Request
Related items