Font Size: a A A

Research Of Hadoop Deployment And Tracing System Under Datacenter

Posted on:2012-07-08Degree:MasterType:Thesis
Country:ChinaCandidate:D C HuangFull Text:PDF
GTID:2218330362456517Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
MapReduce and its open-source implementation Hadoop have gained wide popularity in academia and industry. But developing a highly effective MapReduce application typically involves extensive cluster usage experience and understanding beneath the MapReduce. Besides, how to quickly establish a MapReduce cluster environment is another concern of users. So, how to quickly deploy a MapReduce cluster and later trace its internal workflow is problem which needs an immediate solution.HDTS(Hadoop Deploy and Tracing System), consists of two sub-systems: Hadoop Deploy and Hadoop Tracing. Hadoop Deploy sub-system provided users with a convenient way to deploy and configure Hadoop cluster; Hadoop Tracing sub-system provided users with a tool in tracingMapReduce job dynamically. Through user-friendly portal, Hadoop Deploy sub-system enables users to deploy, distribute and start Hadoop in cluster; Hadoop Tracing sub-system is based on deep understanding of Hadoop source code, it inserted extra code into original ones. Hadoop Tracing sub-system adopts its own tracing kernel, after the equipment of this kernel, user won't feel any difference, and multiple users are allowed.Hadoop Deploy sub-system adopts Ext Ajax Framework and Python CGI as the supporting techniques; Hadoop tracing sub-system uses Java as the implementation language for server and client side. Black-box testing shows that both sub-systems could do their expected jobs and performance testing shows tracing sub-system has a maximum overhead 4%.
Keywords/Search Tags:large scale data processing, MapReduce workflow, environment deploy, dynamic tracing
PDF Full Text Request
Related items