Resarch Of Task-level Data Processing Based On Multicore CPU And Test Of Its Performance On Cluster Platform

Posted on:2012-12-22

Degree:Master

Type:Thesis

Country:China

Candidate:T J Zhang

Full Text:PDF

GTID:2218330362454340

Subject:Computer software and theory

Abstract/Summary:

With the development of computer networks and information technology, people have to deal with increasing information growing at an alarming rate every day, especially companies like Google who provides global analysis and processing of Internet pages and it uses Google MapReduce to process P-class of web pages, audio and video files. The success of Google's large-scale data processing promotes the development of open-source version, Hadoop MapReduce, made by the Apache community. The open-source version of MapReduce is developed using Java language. It has three components: NameNode, SecondNameNode and DataNode. NameNode is manager of the cluster with a management process JobTracker running on it.SecondNamenode is a backup server of NameNode.DataNode are a cluster of computers which do the real processing work using TaskTracker.The three components above are connected using HDFS distributed file system.When DataNode is a multicore computer and its data given are samller than 64M, the advantage of DataNode with multi-core will cease to existIt can only use one core of the computer, and the others are all in idle state. To handle this situation, this paper presents a task-level MapReduce model, use it to replace the traditional thread-level MapReduce.This paper builds a computing platform of Hadoop.after analyzing the operational mechanism of Hadoop and constraints.The corresponding solutions to subjects which prone to failure for the cluster are given.A detailed analysis of the operating mechanism of thread-level MapReduce.Then this paper points out the limitations of small blocks of data processing at thread-MapReduce level by comparision through experimentations.This article combins Hadoop MapReduce with Intel's TBB parallel library developed using c++ programming language forming a task-level MapReduce.By using experiments of seeking PI by several groups on cluster to prove the advantage of task-level MapReduce in processing small block of data.This article also give a comprehensive comparison of these two levels of performance of MapReduce by way of experiment.and make the relationship among the number of cores , data scale and performance.

Keywords/Search Tags:

Multicore, Cluster, MapReduce, Hadoop, TBB

Related items

1	Design Of Mapreduce Task Scheduling Algorithms In Heterogeneous Hadoop Cluster
2	A Scalable MapReduce For Multicore System
3	The Optimization Of High Performance MapReduce FairScheduler And The Implementation On Simulator Of Huge Scale Cluster
4	Research On Hadoop Cluster Scheduling Optimization
5	An Optimized MapReduce Workfow Scheduling Algorithm For Heterogeneous Computing
6	Research And Implementation Of Expansibility Oriented Cluster Architecture
7	Research On Distributed SVM Algorithm Based On Hadoop Platform
8	A Deterministic And Scalable MapReduce For Multicore Systems
9	Research On Data Cube Technology Based On MapReduce
10	The Research Of Improving Performance Of Hadoop Cluster