Font Size: a A A

Research On Query Processing On XML Data

Posted on:2009-03-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:H Z WangFull Text:PDF
GTID:1118360278462012Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Because of its scalability and flexibility, XML has become the standard of datarepresentation and exchange on the web. In many applications, XML is used as therepresentation format of more and more data. As semi-structured data, the processingof XML data brings new challenges. The management of XML has become an im-portant research field of data management. One of important problems of XML datamanagement is how to query XML data efficiently.This paper aims at efficient query processing of XML data and studies the tech-niques of query processing on tree-structured XML data, query processing on graph-structured XML data, query processing on XML stream and query processing in XML-based information integration systems. The main contributions of this paper are asfollows:(1) A systemic method for the efficient path queries processing on tree-structuredXML documents is presented. In details, a storage disk structure is presented fortree-structured XML documents. Such storage structure combines structural index,coding-based join and tree traversal with efficient support of complex path queries.With such storage structure, the implementation of some query operators are proposed.For the path queries with complex structures and value constraints, the cost modeland cost-model-based query optimization strategy is presented. Experimental resultsshow that such query processing method has high efficiency and scalability. The queryoptimization strategy can generate query plan effectively and efficiently.(2) The query processing techniques for subgraph queries and topological querieson graph-structured XML documents are presented. In details, a reachability labellingscheme for DAG are extended to support graphs with circles and storage strategy forsuch labelling scheme is presented. Based on the reachability labelling scheme, sub-graph query processing strategies are presented. Such strategy can process generalsubgraph queries with reachability relationships. With slight modification, such strat-egy can be used to process subgraph queries with both reachability and adjacent rela-tionships. Experimental results show that such strategy can process subgraph queriesefficiently. A novel kind of queries on graph-structured XML documents, topological query, is presented with efficient query processing algorithms.(3) Aggregation queries on XML streams are proposed. As we known, this isthe first time for the study of such problem. The accurate definition is presented. Effi-cient processing algorithm is proposed to support aggregation queries on XML streamswith the support of complex XPath expression and variable kinds of aggregation func-tions. Additionally, such algorithm is also suitable for SAX-based aggregation queryprocessing on XML documents. Analysis and experimental results show that the al-gorithm in this paper has high efficiency and scalability.(4) For the query processing in an XML-based information integration, this pa-per focuses on the problems of partial result transmission, partial results merge anddata sources selection. In details, two data compacting strategies for partial resulttransmission are proposed. The join operators for XML data segments in XML-basedinformation integration system are presented to describe the join in various instances.Efficient algorithms are designed for such join operators. An index structure for datasources selection is presented. Such index structure captures both value informationand structural information of data sources to support the data sources selection for thequeries with complex structures and value constraints. Two index compacting strate-gies are proposed to make the index practical even when the data in data sources iscomplex. Experimental results show that such data source selection strategy has goodprecision and efficiency. The index compacting strategies effectively reduce the indexwithout losing precision of data sources selection.
Keywords/Search Tags:XML, query processing, tree-structured XML document, graph-structured XML document, XML stream, XML-based information integration
PDF Full Text Request
Related items