Font Size: a A A

Resarch Of Task-level Data Processing Based On Multicore CPU And Test Of Its Performance On Cluster Platform

Posted on:2012-12-22Degree:MasterType:Thesis
Country:ChinaCandidate:T J ZhangFull Text:PDF
GTID:2218330362454340Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of computer networks and information technology, people have to deal with increasing information growing at an alarming rate every day, especially companies like Google who provides global analysis and processing of Internet pages and it uses Google MapReduce to process P-class of web pages, audio and video files. The success of Google's large-scale data processing promotes the development of open-source version, Hadoop MapReduce, made by the Apache community. The open-source version of MapReduce is developed using Java language. It has three components: NameNode, SecondNameNode and DataNode. NameNode is manager of the cluster with a management process JobTracker running on it.SecondNamenode is a backup server of NameNode.DataNode are a cluster of computers which do the real processing work using TaskTracker.The three components above are connected using HDFS distributed file system.When DataNode is a multicore computer and its data given are samller than 64M, the advantage of DataNode with multi-core will cease to existIt can only use one core of the computer, and the others are all in idle state. To handle this situation, this paper presents a task-level MapReduce model, use it to replace the traditional thread-level MapReduce.This paper builds a computing platform of Hadoop.after analyzing the operational mechanism of Hadoop and constraints.The corresponding solutions to subjects which prone to failure for the cluster are given.A detailed analysis of the operating mechanism of thread-level MapReduce.Then this paper points out the limitations of small blocks of data processing at thread-MapReduce level by comparision through experimentations.This article combins Hadoop MapReduce with Intel's TBB parallel library developed using c++ programming language forming a task-level MapReduce.By using experiments of seeking PI by several groups on cluster to prove the advantage of task-level MapReduce in processing small block of data.This article also give a comprehensive comparison of these two levels of performance of MapReduce by way of experiment.and make the relationship among the number of cores , data scale and performance.
Keywords/Search Tags:Multicore, Cluster, MapReduce, Hadoop, TBB
PDF Full Text Request
Related items