Font Size: a A A

TJDirect: An Holistic Twig Pattern Matching Algorithm Of XML Document Based On IFED Encoding

Posted on:2008-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:J F ShiFull Text:PDF
GTID:2178360242969504Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
XML has already become the standards of the expression, integration and exchange of the data on web. It has simple form and strong self-describing ability. Beside, it realizes the separation of the content, structure and expression, so that it is more adapt to data expression and exchange. During recent years, XML is widely used in various fields, and its data has abundantly appeared on web. How to query XML document become an important research topic.Querying XML document is usually transformed to join two nodes' list. There are two ideas about joining: one is to decompose it to several simple paths, match them and then join the results. The method inevitably produces useless result. The other is to match holistic twig, that is, to convert query path to a tree structure named twig pattern and then find its match in XML document tree. The method reduces algorithms' I/O and CPU complexity and space complexity. It makes query more efficiently. TJfast: a holistic twig matching algorithm proposed recently based on Extended Dewey encoding can efficiently deal with XML document query, but ExtendedDewey encoding doesn't support dynamic insertion and the performance of TJfast need improve.In this paper, we make improvement aiming at the two shortcomings mentioned above. The main contributions are summarized as follows:(1)We propose a new encoding scheme named IFED(Insert-Friendly ExtendedDewey),which improved from ExtendDewey. It supports both efficient query and dynamic insertion.(2)Algorithm TJfast has I/O and CPU time complexities linear in the sum of sizes of the n input lists and the output list. But some nodes in input lists which do not contribute to final results may be skipped. In this paper, we propose a more flexible twig pattern matching algorithm named TJDirect, which skip some nodes which won't be joined. Algorithms' performance is improved. Furthermore, the algorithm doesn't use sets associated with matching nodes, so the space complexity is reduced.(3)We compare the execution time of TJfast based on IFED and TJDirect based on IFED through experiments and conclude that TJDirect is more efficient than TJfast.(4)We encode for XML documents to make a survey of the space of IFED encoding and ExrendedDewey encoding and make a conclusion that the space of IFED is not very large.
Keywords/Search Tags:XML document, query, holistic twig pattern, encoding, structural joins
PDF Full Text Request
Related items