Font Size: a A A

Research On Queryable XML Data Compression Technology

Posted on:2012-05-25Degree:MasterType:Thesis
Country:ChinaCandidate:R M XuFull Text:PDF
GTID:2218330338493795Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, XML has been the new standard for data representation and exchange. More and more Web data appears in the form of XML document. XML is widely used in various industries because of its characteristies such as semi-structured form, self-describing, exchangibility, extensibility and so on. But these characteristies also introduce the verbosity of XML documents, the mingled structure and data increase the cost of storing, exchanging and processing, which hinder the wider and deeper use of the XML database.XML data compression becomes an effective way of XML data management. Of course, compressing the XML documents isn't the final aim. Query processing and other operation will increase system load if we decompress all the XML data completely. As a result, it is quite necessary to support the directly query on the compressed data.In this paper, queryable XML data compression methods were studied intensively. According to the drawbacks of the XML compressors which are already known, this thesis proposed two XML compression algorithms which support query of the compressed XML documents directly.We propose the definition and detailed algorithm of Structure Sign Tree, which can simplify the structure of XML data by removing the repeated paths. On the basis of this, a new XML queryable compressor SSTQC (a Structure Sign Tree based Queryable Compressor) is put forward to compress XML data and organize queries. SSTQC requires only a single pass over the XML document, and it has excellent compression performance and better query efficiency.Since most of XML data compression methods can not support Twig query efficiently, we proposed a compression algorithm TXQC (a Twig query-supported XML Queryable Compressor) which can support Twig query without complete uncompression. With the good characteristics of prefix code, we obtain the Twig query results by using pattern matching method. Compared to the other XML data compression methods, the TXQC is more effective when we deal with XML complex path queries.
Keywords/Search Tags:XML data, data compression, query processing, Structure Sign Tree, Twig query
PDF Full Text Request
Related items