Font Size: a A A

Research On XML Query Pattern Matching And Document Filtering

Posted on:2010-06-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:B NingFull Text:PDF
GTID:1228330371950169Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
XML (eXtensible Markup Language) has become the standard of the data representation and data exchange, because of its characteristics of self-description and independence of platform. More and more structure data or semi-structure data is stored and exchanged in the form of XML, therefore it is important to solve the problems of query processing and filtering of XML documents.This thesis addresses several key technical problems of query processing and filtering of XML document, which are the basic research directions in the data management area. The twig pattern matching is the core problem of query processing, and there are many research works to deal with this problem. This thesis also addresses the twig pattern matching problem and a new index structure is proposed to accelerate the matching process. Nowadays, the research works focus on the query pattern with only forward axes, while in the real applications, many query patterns with ancestor and parent axis are proposed, and there is not any algorithm of matching those complex patterns holistically. In the research area of SDI (Selective Dissemination system of Information), how to manage the users’subscriptions and the filtering algorithms are the core problems. The formal works focus on the indexing of XPath with forward axis only, therefore the filtering algorithms can filter the documents with those XPath expressions with forward axis only. This dissertation deeply studies corresponding problems in the area of XML query processing and filtering of XML document. Its contributions are summarized as follows:(1) To accelerate the twig pattern matching in the query processing, a new index C-Tree is proposed. By indexing the context of elements in the XML documents, we store the context information among the tag streams which is ignored in the traditional tag streams scheme of former algorithm. Based on C-Tree, a serious of algorithms for twig joins are proposed, and those algorithms canjump lots of useless elements in respective tag stream. Therefore the new proposed algorithms are efficient.(2) In the current research area of XML query processing, most word are focused on matching the pattern with only descendant and child axis. To improve the expression abilities of the query expression, we proposed a new kind of query pattern named xtwig pattern which contains parent or ancestor axis. Based on the characteristics of itself and the information in DTD, a serious of rules of simplification is proposed. Then the relationship among the answers of xtwig pattern is analyzed, and a new structure XHyperCube for storing the intermediate results is designed. For matching the xtwig patterns, this dissertation proposes a holistic algorithm named XtwigStack which does not decompose the xtwig pattern into twig patterns. At last, the experiments show that Algorithm XtwigStack is an effective and efficient.(3) For the problem of filtering documents by large amount of XPath expressions in SDI system, a new method to manage the XPath expressions named NIndex is proposed, which have many advantages. Firstly the NIndex support the ancestor and parent axis in XPath expressions, which provides more challenge for users’profile descriptions. Secondly, there are large amount of common sub-patterns are shared in NIndex. The sharing reduces the space complexity of managing XPath expressions, and avoids the multiple probes and processing of those common sub-patterns. Based on NIndex, a new document filtering algorithm is proposed, to disseminate the XML documents according the XPath expressions proposed by users. The XML document is modeled as SAX model, and the algorithm is event driven by the SAX events. The experiments show that the algorithm basing the NIndex is accurate and efficient.
Keywords/Search Tags:XML, query processing, pattern matching, index, SDI, twig pattern, xtwig pattern
PDF Full Text Request
Related items