Research On XML Indexing And Query Processing

Posted on:2010-08-11

Degree:Doctor

Type:Dissertation

Country:China

Candidate:H Q Wang

Full Text:PDF

GTID:1118360278495869

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Extensable markup language (XML) is a specification created by W3C organization to help developer to organize information freely. It aims to not only fulfill the need of increasing network application, but also provide well reliability and interaction when coorperating with network. XML has four features, namely well formated, extensable, highly structured and convient for network transmission. These features help XML gains extrodenary performance.Recently, there are many techniques on XML data storage and query. But these techniques can not satisfy the requirement of high efficiency on data processing, and usually these techniques do not use index or process queries based on database systems. So the performance of processing large or complex structured XML data is bad.In this paper, we aim to process XPath query on XML efficiently. We propose several labeling scheme suitable for XML tree structure, and organize them as index. Based on the labeling scheme, we propose high efficient algorithms to support XPath query, the main contribution of this paper are:(1) We propose ClusteredAbsolutePath (CAP for brief) labeling scheme and corresponding algorithms. A CAP is defined based on the position of a child node to its parent node in XML definition. So the CAP labeling scheme supports position predict in XPath natually. A CAP can group nodes with similar position and then boost query processing. Based on CAP labeling scheme, we propose BranchFilter and RelatedPathJoin algorithms to process query efficiently. Experiments show that these two algorithms outperform TwigStack algorithm and have good scalability.(2) We propose ClusteredChainPath (CCP for brief) labeling scheme and corresponding algorithms. A CCP groups nodes with same source path and their ancestors as a tree. Nodes on the tree with same depth are bi-similar. All leaf nodes of these trees is a partition of XML nodes. Then the index structure of CCP is the same of 1-Index. Because the ancestors are saved, we can quickly get the ancestor set of a given node set to boost query processing. Based on CCP labeling, we propose RelatedPathJoin algorithm to process query efficiently. Experiments show that algorithm RelatedPathJoin outperforms TwigStack algorithm and has good scalability.(3) We propose SourcePathTree labeling (SPT for brief) by using node region to represent continious node sequence. We optimize the data storage to make the I/O cost minimum. We design algorithms to group related path before join, then only SPT with same related path can be joined, otherwise, they are abandoned without being loaded into memory. Experiments shows that the refined labeling scheme takes smaller storage, smaller I/O cost and better performance.(4) We propose PrimeSequence labeling scheme and corresponding algorithm. PrimeSequence labeling scheme can be used to create an F&B-Index and support query processing efficiently. In PrimeSequence labeling scheme, a node is labeled as a string of primes or production of several distinct primes, if two nodes have the same labeling, they are forward and backward bi-similar. We also propose the algorithm to create region encoding and CCP encoding on F&B-Index, then we can run TwigStack and RelatedPathJoin on F&B-Index. An algorithm based PrimeSequence labeling scheme called DivisionMatch is proposed for query processing, the algorithm can fast confirm subtree structure of a given node. We compare the three algorithms on several dataset and conclude that DivisionMatch algorithm based PrimeSequence labeling scheme performs better than TwigStack and RelatedPathJoin on F&B-Index.

Keywords/Search Tags:

XML, Indexing, Query Processing, Clustered Absolute Path, Clustered Chain Path, Prime Sequence

PDF Full Text Request

Related items

1	Study On The XML Indexing Technique Based On Schema
2	Research On Stochastic Clustered-Dot Screen And Path Planning Based On Voronoi
3	Research On Clustered Superscalar Processor
4	Indexing and path query processing for XML data
5	The Research Of Quering Method Over Encryption Character Data In Database Based On B*Tree Clustered Index
6	A Study Of RFID Data Processing Based On A Path Encoding Scheme In Supply Chain
7	Clustered wireless sensor networks
8	The Research Of Clustered LR-pon Based-on Multi-wavelength And Multi-frequency And Its DBA Algorithms
9	Microarchitecture and compilation support for clustered instruction-level parallel processors
10	Research Of Linux-based Clustered Database System