Font Size: a A A

GPU Computing In Massive Data Processing

Posted on:2016-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:S J XuFull Text:PDF
GTID:2308330461474136Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The rapid growth of Internet leads to explosion of information, a large amount of data is engendered every day in scientific field, engineering field and business computing field. Mass data brings a serious challenge to storage and computing. The demand for computing power is far beyond that of their own IT infrastructure, the traditional way is to continue to increase investment in hardware to accommodate the growth of big data. In addition, due to the complexity of traditional parallel programming model, undoubtedly this not only limits the scalability of the system, but also reduces the system’s ease of maintenance. This is the objective of the proposed requirements for a new framework for parallel computation. In recent years Hadoop has been developed as a mainstream cloud computing platform, due to its MapReduce computing framework and the high-efficient distributed file system (HDFS), which is particularly suitable for handling massive amounts of data. HDFS has high fault tolerance and scalability, allowing users to deploy Hadoop on cheap servers to constitute distributed system. MapReduce programming model shields the underlying details of the distributed system, users can develop parallel applications without the need to understand the underlying details. However, due to the limited degree of parallelism of CPU, Hadoop can hardly handle these data-intensive and compute-intensive problems. In this respect, Hadoop is still a lack of high-performance computing capabilities.Modern GPU has been widely used in general-purpose computing. It is commonly used to accelerate tasks. As we all know, GPU (Graphics Processing Unit) is constituted by a large number of computing cores. It has a strong parallelism, whose computing power is far beyond CPU, so GPU is good at high-performance computing. GPU clusters have been widely used in the field of science theory and engineering computing. However, GPU clusters have low-level data storage capacity, so they lack good fault tolerance. If we integrate GPU and Hadoop together, we can take full advantage of GPUs’ high-performance computing capabilities and Hadoop’s distributed computing model as well as its high-performance distributed file system. Based on this, this thesis will explore how to use GPU in Hadoop. In this thesis, we will introduce four methods to apply GPU in Hadoop. By comparing the experimental results, we analyze these four methods in detail and demonstrate the effectiveness of using GPU computing in Hadoop.
Keywords/Search Tags:Hadoop, GPU, Gig Data, High Performance Computing(HPC), Distributed Computing, Cloud Computing
PDF Full Text Request
Related items