Font Size: a A A

The Large Clusters XML Keyword Search Algorithm Design And Implementation

Posted on:2012-06-09Degree:MasterType:Thesis
Country:ChinaCandidate:M J ZhouFull Text:PDF
GTID:2208330335965452Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
XML (eXtensible Markup Language) is a de-facto standard on the internet with the features of simplicity, scalability, flexibility, etc. The broad use of XML and the large volume of internet users make the rapid growth of XML data. It has been a practical topic to manage massive XML data sets.Hadoop is a distributed computing system based on Map/Reduce framework, which is implemented in cheap PC clusters and supports large-scale parallel computation by Java programming language. Hadoop is quite capable for massive computing due to its capability to separate documents into data blocks and deliver them onto each computation node.In this paper, we focus on XML data management on Map/Reduce cluster, in details we discuss every part of XML keyword search, such as storage, indexing and querying. It has been a hot topic to study keyword search on large-scale data nowadays. We implement keyword search under Map/Reduce framework based on Hadoop and design an algorithm to process queries on XML data in distributed environment. Main steps of the algorithm include XML data partition, parallel encode, and index setup and SLCA computation. We conduct extensive experiments to evaluate the effectiveness of the proposed method.
Keywords/Search Tags:XML, Keyword Search, Hadoop, HDFS, Map/Reduce
PDF Full Text Request
Related items