With the rapid development of computer network technology, the demand forsharing information and exchange data has grown urgently. XML(eXtensible Markuplanguage), a semi-structured description language,has practically become the standardof Internet data description and data exchanging among different systems because of itsexpandability, ability of self-description, simplicity and openness. However, due to itsability of self-description, XML data and structure information are combined together,which results in a great waste of stored resources and network transmission bandwidth.So it has been one of the research focuses to perform compression with high efficiencytoward XML files.The object of this study is to put forward a compression algorithm based on XMLfiles preprocessing from the perspective of reducing the stored resources and networkbandwidth. In order to conduct a better illustration to the design of XMLPre,relevantconcepts, information modes and application structures of XML are introduced in thepreceding chapters, and common algorithm which is based on dictionary includingLZ77, DEFLATE and LZMA are also discussed. To be followed, the design of XMLPreare explained in detail; XMLPre achieves the XML data container division by referringto XMill container division theory, analysis of XML using SAX and defining theformula of divided container. XMLPre divides the data in XML into four containerswith each one storing correlated semantic data; then BWT transformation to eachcontainer is carried out; in the end data in container are compressed and output into filesby using LZMA algorithm.It is concluded that XMLPre has advantage on compression ratio by1%-3%compared with other common compression algorithm shown in the same tests andmeasurement. |