Font Size: a A A

Study On Structural Index Technology And Query Optimization For XML

Posted on:2004-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:S T GuoFull Text:PDF
GTID:2168360095456771Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Various index techniques and join algorithms [12,13,14,15,16,23,24] have been recently proposed, in order to realize query optimization for XML. The indices are built on the tags and the element values. Nevertheless, some indices do not contain all element nodes, many paths need to still be examined in the query; other indices produce redundant data in the preorder or postorder traversal, this makes the cost of query much more. In the proposed join algorithms, although some algorithms such as MPMGIN algorithm [23], outperform standard RDBMS join algorithms, they perform a lot of unnecessary computation and I/O for matching basic structural relationships, especially in the case of parent-child relationships; other algorithms such as the Stack-Tree-Desc algorithm [24], represent the state-of-the-art in structural joins, however, they do not utilize indexed structures but sequentially scan the input lists. Thus, I/O's can be wasted for scanning element that do not participate in the join, and join speed can be influenced.According to this situation, the main works and contributions in the paper are as follow:① As it is inconvenient to update that the conventional numbering schema is used to represent the structure of XML document, a sparse numbering schema based on improvements is proposed in this paper. By comparing with the conventional method, the sparse numbering schema has some merits as follow: the values of start and end do not recomputed, when a new node is inserted, the updating efficiency of tree structure is improvement, and the XML documents are traversed only once, when the schema is constructed, this further decrease the cost of building tree, and the schema can provide a durable conference for index.② As the storage approach of the numbering schema is scarce, this paper proposes a new approach that the sparse numbering schema is stored in the relational database. By utilizing this storage approach, the indices can be easily built on the start column, and the storage space can be mostly reduced.③ This paper refers to the indexing technologies of B~+-tree in DBMS, and combine it with the sparse numbering schema, and proposes a new indexed structure — B~+-tree structural index. It is very important for the optimization of join operation and element location in XML query. By introducing the pointers for further improving the indexed structure, this paper proposes a B~+-tree structural index with sibling pointer (B~+-sp for shorten). This structural index can avoid the defect of always traversing B~+-tree from the boot.④ Based on B~+-sp, this paper proposes Anc-Desc- B~+-sp join algorithm. It is theoretically analyzed that the time complexity of the join algorithm (O(|A|+log|A|)) is obviously less than that of the Stack-Tree-Desc algorithm (O(|A|+|D|+|outlist|) [24], because of |D|≥|A|, |D|+|outlist|>>log|A|. Experiment results primarily prove that the join algorithm is more efficient and quick join algorithm.In XML query, the other important factor of influencing the query time is the problem of the location of XML data source. In order to resolve the problem, this paper proposes a distributed XML data source location system frame, called Cooperative XML Search Engine (CXSE). CXSE can shorten collection time by searching based site selection and⑤ scoring Web document. Accordingly, CXSE really realize to quickly and correctly locate the URL of XML document needed. Moreover,the retrieval system is available to several XML data source in XML query.
Keywords/Search Tags:Numbering Schema, storage, B~+-tree structural index, join algorithm
PDF Full Text Request
Related items