The Design And Implementation Of A Scalable Mapreudce System Prototype

Posted on:2012-03-10

Degree:Master

Type:Thesis

Country:China

Candidate:N Zhu

Full Text:PDF

GTID:2218330362459430

Subject:Software College

Abstract/Summary:

PDF Full Text Request

Since 2004, when MapReduce was proposed for the first time, it has been widely used in all over the world. As a distributed computing framework, it greatly simplifies the work of programmers on that it considers deadlock, fault-tolerance and many other problems for the programmers, which are always the issues to trap the coders. For this great feature, more and more companies deploy their transaction processing system on MapReduce. However, with the inflation of size of data to be processed in this Cloud Age, the original design of MapReduce breaks down on its scalability, which means that under certain hardware configuration, the main stream MapReduce system can only support limited scale of cluster, which in turn limits the processing scalability of the cluster.In this paper, I present a new MapReduce prototype which are based on Distributed Hash Table. In this new system, users do not need to modify their habits on old systems, but I remove the Master Node in MapReduce design and also the the Name Node in the distributed file system. The Distributed File System queries data through distributed hashing and MapReduce system invoke and schedule the tasks by distributed notification mechanism. In this way, the new system can theoretically achieve the scalability of Peer-to-Peer system whose scalability have been proved to be good enough in Internet Environment.In this work, I also implement a prototype based on the above theory. The typical experiment shows that the new system can work well and brings nothing to user's experience. So, we can conclude that the new theory proposed in this paper is feasible and can contributes to the area of large scale data processing.

Keywords/Search Tags:

MapReduce, Peer-to-Peer, Large-scale Data Processing

PDF Full Text Request

Related items

1	Managing large scale distributed data with peer-to-peer search trees
2	A Study On Key Techniques Of Large Scale Peer-to-peer Resource Sharing
3	Large Scale Peer-to-Peer Content Retrieval
4	A Study On Large Scale Peer-to-Peer Search And Applications
5	Data Management In Peer-to-Peer Systems
6	Research On Peer-to-Peer Resource Location In Large-scale Distributed Systems
7	Peer-to-Ppeer Based Large Scale Content Retrieval
8	Ontology-based search algorithms over large-scale unstructured peer-to-peer networks
9	Research On Network Utilization In Peer-To-Peer Services
10	Large-scale Peer-to-Peer Network Statistical Analysis, Characterization And Its Applications