Research Of Hadoop Deployment And Tracing System Under Datacenter

Posted on:2012-07-08

Degree:Master

Type:Thesis

Country:China

Candidate:D C Huang

Full Text:PDF

GTID:2218330362456517

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

MapReduce and its open-source implementation Hadoop have gained wide popularity in academia and industry. But developing a highly effective MapReduce application typically involves extensive cluster usage experience and understanding beneath the MapReduce. Besides, how to quickly establish a MapReduce cluster environment is another concern of users. So, how to quickly deploy a MapReduce cluster and later trace its internal workflow is problem which needs an immediate solution.HDTS(Hadoop Deploy and Tracing System), consists of two sub-systems: Hadoop Deploy and Hadoop Tracing. Hadoop Deploy sub-system provided users with a convenient way to deploy and configure Hadoop cluster; Hadoop Tracing sub-system provided users with a tool in tracingMapReduce job dynamically. Through user-friendly portal, Hadoop Deploy sub-system enables users to deploy, distribute and start Hadoop in cluster; Hadoop Tracing sub-system is based on deep understanding of Hadoop source code, it inserted extra code into original ones. Hadoop Tracing sub-system adopts its own tracing kernel, after the equipment of this kernel, user won't feel any difference, and multiple users are allowed.Hadoop Deploy sub-system adopts Ext Ajax Framework and Python CGI as the supporting techniques; Hadoop tracing sub-system uses Java as the implementation language for server and client side. Black-box testing shows that both sub-systems could do their expected jobs and performance testing shows tracing sub-system has a maximum overhead 4%.

Keywords/Search Tags:

large scale data processing, MapReduce workflow, environment deploy, dynamic tracing

PDF Full Text Request

Related items

1	Continuous MapReduce: An architecture for large-scale in-situ data processing
2	Research On Efficient Task Partition And Scheduling In MapReduce Data Processing System
3	Large scale data processing using MapReduce
4	Research On Dynamic Data Partitioning Algorithm For Large-scale Streaming Data Online Processing
5	Real-time Rendering Based On GPU For Large-scale Virtual Environment
6	Research And Application Of Clustering Algorithms For Large Scale Data
7	The Big Data Processing Framework Base On Large-scale And High Dimensional Image Data
8	Research On Distributed-Memory Ray Tracing For Large-Scale Rendering
9	The Research&Appliance Of Text-Categorization Algorithm In Large-scale Heterogeneous Environment
10	Research On Dynamic Trust Management For Large Scale Distributed Environment