Research On Query Supporting XML Data Compression

Posted on:2006-02-18

Degree:Doctor

Type:Dissertation

Country:China

Candidate:W S Zhang

Full Text:PDF

GTID:1118360155468798

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In recent years, as XML has become an emerging standard for information exchange on the World Wide Web, it is clear that an enormous amount of data in the Internet will be encoded in XML in the near future because of it's extensibility and characteristic of cross-platform. However, XML documents in their textual form are rather verbose and tend to predate disk space and hinder the ability of query, due to the textual and repetitive nature of the XML tags and of several XML types. How to efficiently compress XML data and evaluate XPath queries over compressed XML data is a fundamental problem.In this paper, Methods of XML data compression with query support were studied intensively, including XML data models, schema formalisms and decomposition, the similarity analysis of XML documents, finding frequent subtrees, tree grammar based compression and pushing queries to compression data based on signature automata, etc.The main research works and specific contributions found in this thesis cover the following aspects:The research history and status-art of XML were summarized. Moreover, the XML data management technologies were analyzed. The disadvantages of exist methods of XML data compression were analyzed in detail. Furthermore, the developing aspects and goals of the study on XML data compression were given.A concept of XK-NF normal form for XML documents based on DTD path expression is proposed. The advantage of the definition is that it can represent the normal form with key constrains with three forms of functional dependency. The decomposition algorithm for XML schema is proposed for reducing the data redundancies based on the formalization rules, which is not mentioned by other XML compressor.The method of compressing XML data based on tree grammar is put forward. Redundant data appearing not only in a single XML document but also withindifferent documents, an XML compression method based tree-grammar is proposed. In order to compress XML data with query support, a clustering step based on k-means is performed as the first step for raw XML documents to generate clusters. Next, within a cluster, a frequent sub-structure mining algorithm is presented to generate the compression dictionary similar to FP-growth method. Finally, subtrees are substituted by binding variable and frequent sub-structure based on the thinking of tree-grammar.The method of querying compressed data is studied. A significant portion of this thesis is devoted to query over compressed XML data with the analysis of indexing schemas and query method appeared in other XML compressor. The queries are performed effectively based on signature index and signature automata under non-full decompression.A method of access control rules compression with query support is given. In order to cope with the duplication of access control rule, a rule pruning method is proposed for XML data access control based on DAC model, which can compress the access control rules effectively. Furthermore, the query algorithm is presented for compressed access control map.

Keywords/Search Tags:

XML, Normalization, Data Redundancy, Data Compression, Query Evaluation, Access Control Rules

PDF Full Text Request

Related items

1	Research On Queryable XML Data Compression
2	Query evaluation in the presence of fine-grained access control
3	XML Query And Normalization Research Based On Semantics
4	Research On Queryable XML Data Compression Technology
5	Research On High Performance Redundancy Elimination Techniques For Data Backup Systems
6	Design Of Improved TCP Proxy Technology Based On WAN Data Compression
7	Research On Efficient Query With Access Control On The Ciphertext In The Cloud
8	Research On Key Technologies Of Query Supported Big RDF Data Compression
9	Research On The Compression-based Approximate Query Method For Massive Incomplete Data
10	The Research On XML Database Schema Normalization Based On Constraints