Font Size: a A A

Performance Monitoring And Analysis On Hadoop-Based Distributed Computing Platform

Posted on:2016-10-03Degree:MasterType:Thesis
Country:ChinaCandidate:J S YinFull Text:PDF
GTID:2298330467493190Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
As the computer science and mobile internet technology developing rapidly, we have entered a new era where data is growing explosively. Vast amounts of unstructured data are produced every day from all kinds of fields such as Social Networking, E-commerce, Internet Finance and Biological Health. These data are most closely related to the user behaviors. We are eager to mining useful information from them so as to change the people’s lifestyle and improve the people’s life quality. With the strong market demand, Hadoop is accepted as the most efficient Big Data processing tool by a growing number of enterprises from all kinds of industries. They accomplish a variety of data processing requirements with Hadoop. However, how to maintenance a large-scale cluster and keep the cluster running normally with high performance is becoming a serious problem for most Hadoop users.In this paper, firstly we introduce the basic knowledge of Hadoop, and give an overview of some popular distributed monitoring systems and Hadoop monitoring techniques. Then we explain the monitoring system of Hadoop platform in detail including the monitoring objectives, architecture design and some key technologies. After that we analyze the resource consumption patterns of some representative MapReduce applications, and provide some performance tuning techniques to help Hadoop users improve the performance of MapReduce workloads and maximize the resource utilization. Finally, we propose a modeling method which could help Hadoop users estimate the execution time of a MapReduce job.
Keywords/Search Tags:Big Data, Hadoop Monitoring, Performance Tunning, Execution Time Modeling
PDF Full Text Request
Related items