Font Size: a A A

The Design And Implementation Of A Distributed Computing System Based On MapReduce

Posted on:2017-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:D XiFull Text:PDF
GTID:2348330512954292Subject:Engineering
Abstract/Summary:PDF Full Text Request
In the traditional sequence calculation models, one often has to wait the task before its previous task has completed, this model would be a waste of the machine and also the time. Soon after, computer science made a number of processes in this field, multiple threads, coroutines has emerged. These are the concurrent computation model, greatly improves the efficiency of computer calculations. Especially when computer has multiple CPU, or a CPU has multiple cores, concurrent calculation model can utilize multiple CPU collaborately to complete task, even single nuclear single CPU, also can rely on the thread or process between of context switch, to scheduling multiple calculation task implementation, such some only need is short time can completed calculation of task on not need waiting for its long time. But with the growth of data, even if a stand-alone model of concurrent computation, cannot meet performance requirements, so distributed computing becomes very hot. Distributed computing is to collaborate on multiple computers, with a single model of concurrent computation is similar, but complicated by multiple processes or threads are distributed on multiple computers, each computer that is a compute node, collaborative computing one or more computing tasks.Distributed computing, data can be sliced and then spreading it across a number of different machines, subsection group the data to solve the final merge results to get the final solution. Map Reduce is a distributed model of computation, Map Reduce parallel computing large-scale massive data, calculating data is divided into a large number of machines, that is, grouping and merging calculation.We design and realize a Map Reduce-based high performance, scalability, and load balancing of distributed computing systems, the work has been done by the paper include:1. Design and implemente the distributed computing system architecture, including the main node(Master), the compute node(Agent), and process-driven node(Driver) modules.2. Design and implemente the distributed computing system's resource management, resource management module includes update distribution information in real time and distribute computing resources, resource utilization information for individual applications, and load balancing of resources between different applications.3. Task scheduling in distributed computing system has been designed and implemented, management and calculation of task scheduling module is responsible for specific tasks, close to a logical series of tasks.Finally, the performance of the distributed computing system has made by a simple testing, and did some analysis of distributed computing systems, illustrates the efficiency of distributed computing systems.
Keywords/Search Tags:Distruibuted computing, Resource Manage, Flow, Task Manager, Load Balance
PDF Full Text Request
Related items