Font Size: a A A

The Research And Improvement Of Mapreduce In Scientific Computing

Posted on:2014-01-03Degree:MasterType:Thesis
Country:ChinaCandidate:F F ZhouFull Text:PDF
GTID:2248330398979992Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the dramatic increase of heterogeneous data, cloud computing came into being. MapReduce,which is a programming model of cloud computing has been paid close attention to, especially academia. In order to solve the problem of coverage and intermediate data storage, Many scholars have proposed a number of ways to improve and formed its own programming model,such as Hadoop、Twister and Haloop and so on.In order to realize iterative algorithms, Haloop model increased Loop Control mechanism. Twofunctions,which are named ADDMap and ADDReduce were mainly increased in this mechanism.The purpose of the two functions is to increase the number of iterations. At the same time it also has the corresponding control mechanism of loop in the Twister model. Similarly, in order to perform iterative algorithms, not only does it maintain the original interface, but also it added a parameter M in Map、Reduce、ADDMap and ADDReduce functions. M is to distinguish four kinds of algorithms in scientific computing. If M is equal to one it represents it is the first class of algorithms in scientific computing; If M is equal to two it represents it is the second class of algorithms in scientific computing; If M is equal to three it represents it is the third class of algorithms in scientific computing; If M is equal to four it represents it is the fourth class of algorithms in scientific computing. Because the third class algorithm and the fourth class algorithm in scientific computing are iterative algorithms, then the function and interface of the two kinds of algorithms functions which are frequently used are packaged into the adapter. When doing experiments,according to need developers increase the corresponding function body. In order to ensure data security, the experimental datum were declared to be the type of protection. Put those little change daum in the buffer pools,so that you can read and write datum in a local system of Slave node. Instead of reading and writing these datum from the Master node, not only does it reduce the workload of the Master node,but also it improves the operation efficiency. Based on the shortcomings of these scheduling algorithms, this paper presents an improved algorithm. The algorithm added the following parameters:the cost of computation, the minimum deadline of the task, and the processing speed of clients.It sets up two queues:the queue of computing resources and the queue of minimum deadline of t tasks. The priority of tasks in the computing resources queue is determined by the cost of calculations. When calculating the cost of computations, it should multiplie the parameter of weight. Weight is determined by parameter of M,which is increased in the Map function, Reduce function, ADDMap function and ADDReduce function. If M is equal to one, Weight is also equal to one; if M is equal to two, Weight is also equal to two; if M is equal to three, Weight is also equal to three; if M is equal to four, Weight is also equal to four. The priority of the deadline queue is determined by the deadline. The priorities of the resource queue are higher than the deadline queue, if a deadline of the deadline queue is equal to zero, this task is plugged directly into the first position of the computing resource queue. As a result, is not only is it to ensure the efficient execution of large tasks, but also it takes care of the efficient execution of small tasks. The improved algorithm achieved a good performance. Finally, citing the examples in the last of the paper and doing a few of associated experiments.
Keywords/Search Tags:MapReduce, cloud computing, Map, Reduce
PDF Full Text Request
Related items