Since 2004, when MapReduce was proposed for the first time, it has been widely used in all over the world. As a distributed computing framework, it greatly simplifies the work of programmers on that it considers deadlock, fault-tolerance and many other problems for the programmers, which are always the issues to trap the coders. For this great feature, more and more companies deploy their transaction processing system on MapReduce. However, with the inflation of size of data to be processed in this Cloud Age, the original design of MapReduce breaks down on its scalability, which means that under certain hardware configuration, the main stream MapReduce system can only support limited scale of cluster, which in turn limits the processing scalability of the cluster.In this paper, I present a new MapReduce prototype which are based on Distributed Hash Table. In this new system, users do not need to modify their habits on old systems, but I remove the Master Node in MapReduce design and also the the Name Node in the distributed file system. The Distributed File System queries data through distributed hashing and MapReduce system invoke and schedule the tasks by distributed notification mechanism. In this way, the new system can theoretically achieve the scalability of Peer-to-Peer system whose scalability have been proved to be good enough in Internet Environment.In this work, I also implement a prototype based on the above theory. The typical experiment shows that the new system can work well and brings nothing to user's experience. So, we can conclude that the new theory proposed in this paper is feasible and can contributes to the area of large scale data processing. |