Font Size: a A A

A Hybrid Parallel Processing For Large Xml Parsing With Multiple CPUs And GPGPUs

Posted on:2015-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:P LiuFull Text:PDF
GTID:2348330479454377Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The documents written by XML, which is a semi-structured language, have been increasingly used for data transporting and storing, such as online data, logs, configuration file, content based database and company documents. Usually we process a XML file by scanning the whole file sequentially to analyze the elements and structure. However, when the XML file gets too large it will become a serious issue, because analyzing the file from beginning to end will became a nightmare for general processing methods. A number of approaches have been used to address these performance concerns, while there is still no method to apply GPGPU to processing XML documents.In this paper, we propose the Hybrid Parallel XML Processing(HPXP) algorithm, which processes large-scale XML files on GPU cluster to address the processing performance issue. This system cooperates CPU and GPGPU to the master-slave architecture for processing the XML file. The processing consists of two phases, structure extracting, and tags parsing. The structure extracting uses multiple threads to read the file and recognize the document structure, tags parsing will take advantage of GPGPU to get every tag’s name and attributes using the location information got in structure extracting phase.Our algorithm overcomes some defect in the previous parallel method for processing XML, and proves it suitable for GPGPU to process XML. In the experiment, our method gets 1.5 speed up comparing with the SAX method.
Keywords/Search Tags:XML, CUDA, GPGPU, Performance
PDF Full Text Request
Related items