MapReduce and its open-source implementation Hadoop have gained wide popularity in academia and industry. But developing a highly effective MapReduce application typically involves extensive cluster usage experience and understanding beneath the MapReduce. Besides, how to quickly establish a MapReduce cluster environment is another concern of users. So, how to quickly deploy a MapReduce cluster and later trace its internal workflow is problem which needs an immediate solution.HDTS(Hadoop Deploy and Tracing System), consists of two sub-systems: Hadoop Deploy and Hadoop Tracing. Hadoop Deploy sub-system provided users with a convenient way to deploy and configure Hadoop cluster; Hadoop Tracing sub-system provided users with a tool in tracingMapReduce job dynamically. Through user-friendly portal, Hadoop Deploy sub-system enables users to deploy, distribute and start Hadoop in cluster; Hadoop Tracing sub-system is based on deep understanding of Hadoop source code, it inserted extra code into original ones. Hadoop Tracing sub-system adopts its own tracing kernel, after the equipment of this kernel, user won't feel any difference, and multiple users are allowed.Hadoop Deploy sub-system adopts Ext Ajax Framework and Python CGI as the supporting techniques; Hadoop tracing sub-system uses Java as the implementation language for server and client side. Black-box testing shows that both sub-systems could do their expected jobs and performance testing shows tracing sub-system has a maximum overhead 4%. |