Font Size: a A A

Design And Implementation Of The Hadoop Platform Benchmark Suite

Posted on:2016-10-25Degree:MasterType:Thesis
Country:ChinaCandidate:L W ChuaiFull Text:PDF
GTID:2308330479991080Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Recently, with the research and development of the C loud C omputing, Hadoop platform has received much attention. Hadoop is an open source distributed infrastructure, inc luding the distributed storage system HDFS and distributed computing framework Map Reduce. At present, Hadoop has developed form version 1 to version 2, and the main character is the adding of YARN resource management component which improves Hadoop’ architecture. At present, Hadoop has been applied to web search, machine lear ning, business analys is, biolog ical calculation and other big data areas. Hadoop allows users to develop distributed programs without understanding the details of the distributed layer. However, how to use the Hadoop platform effectively, improve the resource utilization of Hadoop cluster, improve the performance of the user program, etc., also bring the problem to users.In view of the above problems, this paper designs and achieves a Hadoop benchmark suite, which mainly inc luds three components: a comprehensive and representative load s tool, a Hadoop resource monitoring tools and a Map Reduce performance tracking tool. This paper mainly complete the following test conte nt:(1).using comprehensive test program set to evaluat e the performance of Hadoop platform, inc luding the HDFS reading and writ ing performance, Map Reduce’ performance at different types of load and the YARN’ performance.(2) the resource monitoring tool can real-time observe the Hadoop’ resource utilization information, including CPU, memory, disk and network, etc., so it will complete load type analysis work.(3)the performance tracking tool track s the run- time information of Map Reduce programs, so it easy user understanding of the load running features, finding the performance bottleneck, a nd then optimizing the user program. This paper realizes the HDFS read and write performance testing program IMP-DFSIO, which is more stable and convincing than the Test DFSIO in Hadoop. This paper also fucuses on achieving the Map Reduce performance tracking tool Perf Trace. Through the Btrace script extracting run- time information of Map Reduce model’ sub processes,which is convenient for the user to analyze program properties,to find performance bottlenecks, and to optimize the performance of the program.
Keywords/Search Tags:Cloud Computing, HDFS, Map Reduce, YARN, Performance Evaluation
PDF Full Text Request
Related items