Font Size: a A A

Join Method Research Based On MapReduce

Posted on:2015-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:Q K GuoFull Text:PDF
GTID:2268330428490983Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Network and Cloud computing technology developing rapidly leads to global dataemergs the situation of explosive increasing. The data is called figuratively Massive Data orBig Data. The value of hiding back of data becomes higher simultaneously, it not only canprovide decision-making and business opportunities for the enterprise owning the data, butalso can bring support for more convenient, intelligent, efficient service. Big data includesmore data types and more complex data structure. A variety of structured, semi-structured,unstructured data have been generated by varieties of application environments. Humanity iswelcoming the era of big data.Under the background of big data, the data value received unprecedented attention andmore and more people turn their attention to big data analysis and processing. Traditionalrelational data management and analysis technology, parallel computing technology cannotmeet the challenges brought by big data because of their own limitations. Therefore, newtheories and technologys are needed to support big data analysis and processing.As the representative of new programming model for data-intensive computingprogramming model, MapReduce has played an irreplaceable role in big data analysis andprocessing because of its good scalability, high tolerance, cheapness. However, thatMapReduce does not directly support the join brings difficulty for analyzing and processingrelational data. Join is one of basic operation in relational algebra, is the basis means ofrelational data analysis and processing.Existed MapReduce-based join method mostly only concerned equi-join. But simpleequi-join cannot complete the depth analysis job, more complicated join types such astheta-join and cross product be also needed. Rare Research focus on theta-join, or lack ofdetailed description, difficult to understand and implement, or cannot adapt the changeablecomputing environment.Based on the above reason, this paper proposed a simple and effective method forprocessing Theta-join using MapReduce. Simple embodied in easy to understand, describe indetail; Effectiveness reflected in the ability to set the Reducer’s number according to differentinputs to adapt to changeable computing environment. This Method is called Adaptive ShareMapReduce Theta(ASMRT), means MapReduce-based adaptive share theta-join algorithm. Itincludes two parts, MapReduce Theta(MRT) and Adaptive Share(AS). AS algorithmcalculates the shares of every dataset and Reducer’s number according to cardinality of everydataset. MRT algorithm processes theta-join according to the shares of data sets and Reducer’snumber. The theory model of MRT algorithm, MRT model, utilize the variable has norelationship with join record to logic partition, not only according to the partition logic of MapReduce processing any condition theta-join and make MapReduce processing theta-joinpossible, but also avoiding data skew problem caused by key uneven distribution in datasetsin nature. To illustrate the feasibility and effectiveness of the proposed algorithm, this paperimplement ASMRT algorithm, this paper analyzes the execution process of MRT from theperspective of relational algebra and analyzes AS using representative examples. The resultsshow that this algorithm can utilize a MapReduce procedure to process arbitrary conditionsmulti-way theta-join simply and efficiently.
Keywords/Search Tags:Big Data, MapReduce, Theta-Join, Partition, Cloud Computing
PDF Full Text Request
Related items