MapReduce programming model and its implementations have simplified many par-allel applications. Because of the raising demand of higher computing performance, Graphics Processing Units (GPU) has been used to accelerate MapReduce in several stud-ies. Different from CPU, high GPU utilization requires not only descent parallel algo-rithm but also careful considerations of hardware details. This paper describes the devel-opment path of our MapReduce system from single GPU to multiple GPUs. Utilization of each GPU is promoted by using new GPU features such as streams and Hyper-Q. Fur-thermore, several scheduling schemes are designed to avoid blocked GPU operations. To address the challenge of Big Data, our MapReduce system handles large data sets that ex-ceed GPU and even CPU memory. Experimental results show the performance im-provement and increased scalability gained from each acceleration technique. Although our current work is specific to MapReduce, many underlying ideas are also applicable to acceleration of other GPU applications. |