Font Size: a A A

The Principle And Design Of Distributed Computing Platform Based On Mapreduce

Posted on:2011-10-25Degree:MasterType:Thesis
Country:ChinaCandidate:W F ZhangFull Text:PDF
GTID:2198330338486037Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The rapid development of Internet applications brings a plenty of opportunities for enterprise development, and a variety of personalized applications and services were distributed with infinite charm. However, it results in a mass of data. Therefore, it is a challenging job for all new Internet companies to properly and efficiently make decisions from the analysis of massive amount of useful information.Traditionally, people often choose to use the distributed computing system to deal with such complex and huge task. The traditional distributed computing platform is often dependent on high-end large-scale servers, and needs professional programmers of distributed and parallel computing for long-term design and maintenance. This often makes the new Internet companies feel tremendous economic pressures. Therefore, designing a scalable distributed computing platform composed of a large number of low-cost machines has become particularly important.MapReduce is a parallel programming model, which can be used to handle large data sets in the process of program design. Programs that are based on this function can be complicated by large-scale low-cost machines to perform tasks. Distributed computing systems based on mapReduce model can be used to solve the problems such as partitioning the input data, scheduling in the cluster, handling machine error, controlling necessary communication among machines. This allows programmers without a parallel programming experience to make use of a large number of distributed system resources.On the basis of the merits of mapReduce programming, after an analysis of various exsiting distributed computing system, this paper gives an idea to design a common scalable distributed computing platform that runs on low-cost machines. First, we proposed a more suitable framework of distributed computing platform for analyzing mass, after comparing several current popular distributed computing technology and summing up the advantages and disadvantages of each. Then multi- functional sub-modules were properly designed from the whole structure. We wrote a lot in the terms of system I / O modules and MapReduce modules, because the function of system I / O will directly affect the system's overall performance, While the MapReduce module is the core of the whole system. Well-designed MapReduce sub-module is a guarantee of good functioning of the system. Finally, this paper discussed the key strategies that may affect the system performance, including task scheduling, fault-tolerant mechanisms and so on.
Keywords/Search Tags:Data processing, MapReduce, Distributed computing, Scheduling, Fault tolerance
PDF Full Text Request
Related items