Font Size: a A A

Design And Implementation Of A Fast Non-extractive XML Parser

Posted on:2011-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y S ZhangFull Text:PDF
GTID:2178360305476432Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As a platform- and language-neutral markup language, XML plays an important role indata representation and data exchange over Internet. However, how to improve the perfor-mance of XML parsing is an urgent task. Nowadays most research is based on XML DOMwhich is the most widely used XML parsing model. This paper presents a fast non-extractiveXML parser based on VTD-XML, called NEM-XML.Firstly, NEM-XML is a non-extractive XML parsing model, which means that it doesnot create node objects for all XML nodes during parsing. Instead, it encodes the nodeinformation in 64-bit integers. In this way, a lot of memory space is saved and parsing per-formance is improved. To gain more ?exibility and usability, NEM-XML keeps the structureinformation in a static linked array. This kind of data structure can facilitate the updatingand navigation operation among XML nodes significantly.Secondly, it is quite promising to reuse the XML parsing results where there is no needto perform updating in XML document. This is a good way of avoiding parsing the sameXML document repetitively. This paper makes a further change on NEM-XML to reducethe space needed to save the XML parsing results and the time to restore them.Finally, parallel computing is a hot research field nowadays and parallel XML parsingtherefore becomes more and more popular. This paper proposes a restricted XML partitionmethod to reduce the uncertainty of each chunk. This partition method takes both the docu-ment structure and load balancing into consideration. The partition result is quite satisfied.The work on XML parsing technology has certain practical significance. On one hand,it extends VTD-XML to make it more ?exible and makes a deep research on reusing XMLparsing results, which promotes the application of XML in various fields to some extent.On the other hand, the partition algorithm this paper presents is quite e?cient and providessome reference for relative research on parallel XML parsing.
Keywords/Search Tags:XML Parsing, VTD-XML, Non-extractive, Reusability, Parallel Computing
PDF Full Text Request
Related items