Support Query Xml Data Compression Algorithm

Posted on:2011-12-23

Degree:Master

Type:Thesis

Country:China

Candidate:G J Yu

Full Text:PDF

GTID:2208360308966965

Subject:Computer software and theory

Abstract/Summary:

In recent years, XML has increasingly widely been used to data exchange and data representation on the Internet, and more and more companies and organizations have adopted XML for data exchange and storage. However, there is a fatal disadvantage of XML ,that is data redundancy of the structural data . XML introduces a large number of repeated semi-structure element tag data at the same time of representing valid data. These data increase the transmission burthen of broadband, and increase the cost of storaging XML data, resulting in an unnecessary waste of storage space. So,compressing XML data becomes a very necessary work. Fast qurey,supporting partly decompression on the compressed data. The compression and query efficiency of some XML data compression algorithm in now adays is not satisfactory. While a commercial or researching XML database is lower capability in this, that could be improved largely.For that, this thesis analyzes the characters of the real XML data, especially large volume of XML data, so presentes a new compression algorithm. The compression algorithm can not only compress XML data efficiently, also supports achieving quickly user's query.First, in order to improve query efficiency, presente a new coding scheme in the XML structure-tree,this thesis calls the new coding as the level of odd-coding method, and analyzes its unique nature from theoretics.Secondly, use a binary string to compress structure data such as a lot of redundancy element tag names and attribute names in the XML document, and implemente it with Hash to improve the efficiency of query. Then by analyzing the characters of XML document data in the real life, especially large amounts of data of XML document data, as well as real user's query requirements to XML data, defined and introduced a concept of Isomorphic Sub-Trees. On this basis, innovatively propose a index structure of Isomorphic Sub-Trees,to compresse the redundant structural data of XML document and improve the efficiency of query on the compressed data. In order to compress the data furtherly, according to the features analyzed aboat the value of this level of odd-coding that is to compress the values of node encoding, presente a combination algorithm of n-tuple. And at the later the thesis also proves and analyzes theoretically that, comparing non-coding value of levels odd number, n-tuple combining and such as n-tuple splitting presented in the query chapter on basis of that could improve efficiency of query.Again, this thesis analyzes common needing of query based on the content data on the XML data. then as a basis, presented and defined the concept of content XML data field of commonly used in the form of short keyword query, and presented the compression schema that separates the general content XML data from the content XML data field of commonly used in the form of short keyword query. This method could reduce average response time of query.Thirdly, analysise and discusse simple path query, branch path query, query with the values of contents and the query algorithm of XPath axes usually used in the XQuery on the Isomorphic Sub-Trees index structure and the n-tuple splitting algorithms. And designe a pool of buffers to improve efficiency of query furtherly.Finally, the thesis compares wth the several existing classical algorithms on the experiment. The results of experiment show that this compression algorithm has a high compression ratio and high efficiency of query.

Keywords/Search Tags:

XML, compression, query, isomorphic subtree, n-tuple

Related items

1	The Research On Frequent Subtrees Mining And Corresponding Techniques
2	Strategy Of Optimized Query In Frequent Subtree
3	The Research And Design Of Full Reducer Algorithm Based On Data Compression In Distributed Query Optimization
4	A Connection And Combination Based Research For Subtree Mining
5	Research On Embedded Frequent Subtree Mining
6	Research On Self-organizing Tuple Reconstruction For In-memory Column Database
7	The Research On Query Optimization Of Native Xml Database
8	A Method Of Cross Domain QoE Guarantee Based On Isomorphic Flows For Network Multimedia Traffic
9	Resolution-based Automated Reasoning In Linguistic 2-Tuple
10	Embedded And Export Of Frequent Subtree Mining Algorithm