Font Size: a A A

Research On Query Supporting XML Data Compression

Posted on:2006-02-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:W S ZhangFull Text:PDF
GTID:1118360155468798Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, as XML has become an emerging standard for information exchange on the World Wide Web, it is clear that an enormous amount of data in the Internet will be encoded in XML in the near future because of it's extensibility and characteristic of cross-platform. However, XML documents in their textual form are rather verbose and tend to predate disk space and hinder the ability of query, due to the textual and repetitive nature of the XML tags and of several XML types. How to efficiently compress XML data and evaluate XPath queries over compressed XML data is a fundamental problem.In this paper, Methods of XML data compression with query support were studied intensively, including XML data models, schema formalisms and decomposition, the similarity analysis of XML documents, finding frequent subtrees, tree grammar based compression and pushing queries to compression data based on signature automata, etc.The main research works and specific contributions found in this thesis cover the following aspects:The research history and status-art of XML were summarized. Moreover, the XML data management technologies were analyzed. The disadvantages of exist methods of XML data compression were analyzed in detail. Furthermore, the developing aspects and goals of the study on XML data compression were given.A concept of XK-NF normal form for XML documents based on DTD path expression is proposed. The advantage of the definition is that it can represent the normal form with key constrains with three forms of functional dependency. The decomposition algorithm for XML schema is proposed for reducing the data redundancies based on the formalization rules, which is not mentioned by other XML compressor.The method of compressing XML data based on tree grammar is put forward. Redundant data appearing not only in a single XML document but also withindifferent documents, an XML compression method based tree-grammar is proposed. In order to compress XML data with query support, a clustering step based on k-means is performed as the first step for raw XML documents to generate clusters. Next, within a cluster, a frequent sub-structure mining algorithm is presented to generate the compression dictionary similar to FP-growth method. Finally, subtrees are substituted by binding variable and frequent sub-structure based on the thinking of tree-grammar.The method of querying compressed data is studied. A significant portion of this thesis is devoted to query over compressed XML data with the analysis of indexing schemas and query method appeared in other XML compressor. The queries are performed effectively based on signature index and signature automata under non-full decompression.A method of access control rules compression with query support is given. In order to cope with the duplication of access control rule, a rule pruning method is proposed for XML data access control based on DAC model, which can compress the access control rules effectively. Furthermore, the query algorithm is presented for compressed access control map.
Keywords/Search Tags:XML, Normalization, Data Redundancy, Data Compression, Query Evaluation, Access Control Rules
PDF Full Text Request
Related items