GPU Computing In Massive Data Processing

Posted on:2016-11-23

Degree:Master

Type:Thesis

Country:China

Candidate:S J Xu

Full Text:PDF

GTID:2308330461474136

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

The rapid growth of Internet leads to explosion of information, a large amount of data is engendered every day in scientific field, engineering field and business computing field. Mass data brings a serious challenge to storage and computing. The demand for computing power is far beyond that of their own IT infrastructure, the traditional way is to continue to increase investment in hardware to accommodate the growth of big data. In addition, due to the complexity of traditional parallel programming model, undoubtedly this not only limits the scalability of the system, but also reduces the systemâ€™s ease of maintenance. This is the objective of the proposed requirements for a new framework for parallel computation. In recent years Hadoop has been developed as a mainstream cloud computing platform, due to its MapReduce computing framework and the high-efficient distributed file system (HDFS), which is particularly suitable for handling massive amounts of data. HDFS has high fault tolerance and scalability, allowing users to deploy Hadoop on cheap servers to constitute distributed system. MapReduce programming model shields the underlying details of the distributed system, users can develop parallel applications without the need to understand the underlying details. However, due to the limited degree of parallelism of CPU, Hadoop can hardly handle these data-intensive and compute-intensive problems. In this respect, Hadoop is still a lack of high-performance computing capabilities.Modern GPU has been widely used in general-purpose computing. It is commonly used to accelerate tasks. As we all know, GPU (Graphics Processing Unit) is constituted by a large number of computing cores. It has a strong parallelism, whose computing power is far beyond CPU, so GPU is good at high-performance computing. GPU clusters have been widely used in the field of science theory and engineering computing. However, GPU clusters have low-level data storage capacity, so they lack good fault tolerance. If we integrate GPU and Hadoop together, we can take full advantage of GPUsâ€™ high-performance computing capabilities and Hadoopâ€™s distributed computing model as well as its high-performance distributed file system. Based on this, this thesis will explore how to use GPU in Hadoop. In this thesis, we will introduce four methods to apply GPU in Hadoop. By comparing the experimental results, we analyze these four methods in detail and demonstrate the effectiveness of using GPU computing in Hadoop.

Keywords/Search Tags:

Hadoop, GPU, Gig Data, High Performance Computing(HPC), Distributed Computing, Cloud Computing

PDF Full Text Request

Related items

1	Simulation Runner: A Lightweight Cloud-based HPC Platform
2	Research On Key Technologies Of The MPI-based High Performance Cloud Computing Platform
3	Research On Optimization Of Map Reduce For Interactive Analysis On Big Data
4	Research On The Security Scheme Of Cloud Outsourcing Distributed Computing Based On SGX
5	Research And Application Of Big Data Retrieval Based On Cloud Computing
6	A Scalable And Collaborative Computing Platform For Large-Scale Electromagnetic Computing
7	Design And Implementation Of The Online Shopping System Based On Hadoop Cloud Computing Framework
8	Research On GPU Parallel Computing And Application For HPC Cloud
9	Research On The Key Technology Of Processing Large Data Based On Hadoop
10	Design And Implementation Of Distributed Data Storage Based On Hadoop