Font Size: a A A

Research And Implementation Of XML Labeling And Querying Algorithms Based On Map Reduce

Posted on:2017-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:B W WeiFull Text:PDF
GTID:2308330503959695Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
XML features of self-describing and scalability take it as an main network data form to get more and more applications. the single XML document data volume is increasingly big and XML data has the semi-structured characteristic, these factors make structured and relational databases do not apply to query processing of XML documents. How to effectively support query of XML data, especially how to effectively meet certainly specific semantic and get corresponding query results become a research focus in the current time.This paper do the whole study with MapReduce, that is a distributed computing framework and applied to big data developing platform Hadoop. The platform can be deployed in the cheap PC cluster, data automatically distribute in each node of this cluster, so as to implement parallel processing of the data, that why MapReduce be used for data querying of XML. In the process of querying XML document, a lot of researches transform XML document into a inverted tree, labeled every node of tree with a label, which researches by prefix flow labels are the most, but when labeled nodes with prefix flow labels, labeled length of any node would become bigger and bigger with increase of the document tree, node set generated by labeling take up considerable storage space. In the meantime, nodes labeled by this kind of label only can compare node position’s relationship in the sub-tree, this relationship is analyzed from the perspective of layer in the XML tree, but from whole XML tree, can not see the absolute position. These factors result in the efficiency of XML querying algorithm based on this kind label generally is not high.According to above problems of the prefix flow label, we propose a new kind of label, Xwei. The label is different from common prefix flow label, it label node with pre-order of nodes in the document tree, and nodes that correspond document tree only retain relatively parent-child relationship of nodes, therefore depth of tree do not impact the volume of labeled data set. way of pre-order labeling is a good way to keep complete structure of XML tree, we can cut a part of XML tree nodes based on the mechanism of MapReduce, so we can get small data collection, use indirect relationship of parent-child correspond with this part data of nodes, design the corresponding query algorithm, acquire results meet query semantic.In this paper, all experiental process are completed by MapReduce, choose prefix flow label Dewey, ED as comparative objects, do many contrastive experiments in two aspects of labeling and querying. It is showed by labeling experiments, when the depth of document tree is not high, Xwei labeling efficiency is 10% higher than Dewey and ED, when increasing depth of document tree, compare with Dewey and ED, the range of improving with Xwei reachs 25% or more; In querying experiments, we do the result based on query semantic SLCA and ELCA, Dewey and ED query nodes by algorithm of LISA, Xwei can be designed corresponding querying algorithms by its properties of labeling. Experimental results show that querying algorithms by Xwei is a efficient XML querying algorithm, which greatly reduce the query steps, enhance the querying efficiency. In contrastive experments based on these two query semantic, query efficiency by Xwei is 75% higher than Dewey, ED.
Keywords/Search Tags:Xwei, XML, Map Reduce, Dewey, ED
PDF Full Text Request
Related items