Font Size: a A A

Multiple Datasets Joins Based On Time Cost Evaluation Model For Distributed System

Posted on:2016-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:L B XiaFull Text:PDF
GTID:2348330479453407Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Distributed computing provides a new platform for big data analysis and processing. Map Reduce is an important programming model, it is often used for processing large datasets in a parallel or distributed computing environment. However, because of some disadvantages of this programming model, it is inefficient to perform join operations in Map Reduce when mulitiple datasets are involved. How to improve the existing methods which use Map Reduce to process multiple datasets joins, has significance to improve the efficiency of data query and analysis.Considering the time cost of join processing, sorting and compression in a MapReduce job, a time cost evaluation model is extended for calculating the time cost of a Map Reduce job. And in order to make the model more useful, how to estimate the amount of join results by probability distribution function is presented.A new method is designed to deal with the problem of multi-join by the time cost model, greedy strategy and dynamic programming. Firstly, some equi-joins are processed to reduce the scale of the unequi-join; next, all unequi-joins are processed by multi-way theta-join or Two MRJs(Map Reduce Jobs); at last, the final task is decomposed into several subtasks according to the time cost, and optimal schemes for each task are obtained by greedy and dynamic programming. The new method reduces the cost of processing task by breaking down the task and choosing the appropriate join methods for subtasks.We conducted extensive experiments using Hadoop to prove that the new method can improve the efficiency of the join operation of task execution, and it is more efficient than those common methods such as Hive and Pig.
Keywords/Search Tags:distributed computing, join plan, time cost evaluation model, greedy, dynamic programming
PDF Full Text Request
Related items