Font Size: a A A

XML Keyword Search Algorithm Based On Hadoop

Posted on:2017-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:Q L LiFull Text:PDF
GTID:2348330488472010Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Extensible markup language(XML),based on standard generalized markup language,is a set of rules for defining semantic markup.It has played a more and more important role in the Web programming.XML is becoming the core technology of electronic commerce operation and data management.In order to extract the information from the massive XML data,many XML data query methods have emerged,which makes the XML data query become a hot research topic in the related fields.Cloud computing is a programming model which can implement program parallelization.In simple terms,massive data is distributed and stored in the cluster composed of a large number of machines.The cluster itself can be composed of a large number of low cost computers without buying the macines with high configuration.It is a kind of resource conservation to a large extent.As a mainstream cloud computing platform,Hadoop has been widely concerned by researchers.Due to its own convenience and simplicity of the characteristics,Hadoop allows users to write code handily.Its obvious advantage makes the problem of large-scale XML keyword query to be solved.This paper deeply studies the XML keyword query algorithm,and proposes a parallel scheme of the query algorithm using Hadoop as a computing platform.The main research work is as follows:(1)Aiming at the problems existing in the current XML keyword query algorithm,this paper proposes an intelligent grouping scheme based on the Dewey code distribution feature,and groups the elements in the Dewey code set before the query starts.The grouping scheme is designed on the basis of index search algorithm,and an Intelligent Indexed Lookup Eager(IILE)algorithm is proposed.The results of comparative experiments show that the proposed intelligent grouping index query algorithm has a high efficiency.(2)To account for too long runtime of massive data in a single machine,and the large scale data processing requirements,and the data block mechanism in Hadoop environment,this paper further analyzes the characteristics of IILE algorithm,gives a SLCA decomposition and combination based attribute,which is is favorable for the distributed computing.On the basis of the attribute,this paper puts forward an XML keyword parallel query scheme which is designed and implemented based on Map Reduce programming model.The experimental results show that the proposed parallel scheme has high execution efficiency for massive XML keyword query under the Hadoop platform.
Keywords/Search Tags:Cloud Computing, XML, Keyword Search, Intelligent Grouping, MapReduce
PDF Full Text Request
Related items