Research And Implementation Of Distributed Twig Query Processing Over Massive XML Documents In The Cloud

Posted on:2015-10-08

Degree:Master

Type:Thesis

Country:China

Candidate:P Zhang

Full Text:PDF

GTID:2308330482455549

Subject:Computer software and theory

Abstract/Summary:

With the rapid development of the Internet, XML, as a semi-structured data, has become an important standard format of data storage and exchange on the Internet. Thus, the rapid growth of the data size of the XML documents makes query processing of XML data a more popular research direction. One of the core operations of XML data query is twig pattern query, but the traditional twig algorithms cannot be applied to massive XML documents query processing. Since there is little research achievement of previous work on distributed twig query processing, how to efficiently process twig query over massive XML data becomes a new research topic.Cloud computing is the current main technology for processing massive data. Massive data management technology research in cloud computing includes data partition and distributed query processing. This thesis focuses on XML data partition and distributed query processing to implement massive XML docuemnts twig query processing into cloud computing environment. This thesis first presents a random fragmentation strategy of XML document named AF (Arbitrarily Fragmentation), which ensures the randomness of fragmentation by recording nodes information and avoids partition restrictions of structural information. Then we propose a cloud-based distributed query processing algorithm DTS(DTwigStack), which effectively handles twig query processing over massive XML data. AF segmentation algorithm uses dividing node information, parallelly processes all the fragments, and combines all the intermediate query results into final partial results. In order to guarantee that all the intermediate results related to the same final result are gathered in the same reducer, ComMapReduce frame with Coordinator node is introduced. DTS algorithm uses Coordinator to collect key information from all sites and then returns summarized globle key information to DTS on each mapper.This thesis designs a series of experiments, including the evaluation of DTS performance with varied numbers of Hadoop slave nodes and datasets sizes respectively. Speedup, sizeup and scaleup are also analyzed according to the experimental results. The performance evaluation verifies both effectiveness and efficiency of DTS for twig query processing over massive XML documents.

Keywords/Search Tags:

cloud computing, distributed computing, mass data, xml, twig query processing

Related items

1	Research Of Mass Data Processing In The Telecom Business Analysis Support System (BASS) Based On Cloud Computing Platform
2	Research On Key Technologies Of Distributed Rank-aware Query Processing
3	GPU Computing In Massive Data Processing
4	Research On Privacy-Preserving Graph Data Processing Techniques In The Cloud
5	Research On Indexing And Query Processing In Cloud Computing Systems
6	Research On Query Processing Technology For XML Data Based On HoListic Twig Pattern
7	Research On Key Technologies Of The MPI-based High Performance Cloud Computing Platform
8	Research On Real-Time Query Processing In Cloud Computing For Terms In Data Streams
9	Research On Key Problems Of Efficient Processing Of Big Data In Cloud Computing
10	Research On Collaborative Wireless Sensor Network Architecture Based On Distributed Mass Data Processing Technology