Font Size: a A A

A New XML Coding Scheme And Its Structure Join Algorithm

Posted on:2012-08-09Degree:MasterType:Thesis
Country:ChinaCandidate:J X SangFull Text:PDF
GTID:2218330338466702Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With respect to XML query processing technology, the prior art mainly focuses on path decomposing and collection matching, which is much more effective than traditional navigate traversal. Furthermore, node coding and structure join are two prominent approaches for this technology. Based on the advantage of these two approaches, this thesis provides a brand-new XML coding scheme, named BTB, and a new structure join algorithm-BTBContainJoin.Based on the analysis of the shortcomings of the existing code scheme and the coding theory of Huffman, a new binary-tree-based coding scheme is provided in this thesis, in which a binary bit serves to save the path information of one branch and each code of the parent node is the ex-substring of its child node. Therefore, the structure relationship between nodes can be determined by the operation of the coded strings, and the node's level can be determined by its length instead of saving path information. BTB code transfers the XML document into a binary tree and code each node based on the path. The code is in the form of binary string, with each string representing an edge of the binary tree to save the node path. BTB code introduces downshift identification character to support the structure relationship determination between parent and child and the determination of the "brotherhood". BTB code has the feature of prefix coding, but is superior to prefix coding on the costs of saving path information. Experiments show that, BTB coding has better CPU performance and shorter code length.Furthermore, based on the analysis of the disadvantage of existing structure join algorithm, this thesis provides a new partition approach to BTB coding space and a new structure join algorithm-BTBContainJoin. The algorithm partite not only the code space, but the input set. Operation of structure join are executed between the subset after the partition, and the union of the joint results of all the subsets is the whole gaining of the structure joint. Technically, data sorting is not necessary, which results in the shorter time and less space. Experiment shows that BTBContainJoin has good CPU performance.
Keywords/Search Tags:XML, XML query, Coding scheme, Structure join, Binary tree
PDF Full Text Request
Related items