Font Size: a A A

Performance Monitoring And Analysis On Hadoop-based Data Analysis Platform

Posted on:2018-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y LingFull Text:PDF
GTID:2348330518995453Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet industry, the era of "Big Data" has arrived. Hadoop with its advantages of high fault tolerance, high reliability, high scalability, high efficiency, low cost and simplicity shines in the mass data processing field.But with the growing Hadoop cluster size and the increasing users of cluster, operation and maintenance of the cluster gradually trouble, therefore, we need the real-time performance monitoring and analysis of Hadoop clusters to ensure high performance of clusters.In this thesis, firstly we give a brief overview of cluster monitoring metrics and monitoring techniques, and then design and implement performance monitoring system on Hadoop-based data analysis platform according to the needs of cluster operation and maintenance personnel.This system can facilitate cluster operation and maintenance personnel to understand the cluster state, the operation status of each component and resource usage of each node in real time so that they can deal with cluster warnings in time to ensure cluster efficient operation. Secondly we launch a collection and analysis on HDFS data distribution and data access, find that there is a phenomenon of unbalanced data distribution and data access on DataNode is consistent with DataNode performance resource consumption trend. Thus, we put forward data distribution optimization strategy, study the impact of data distribution on the HDFS data access and running job. Finally we draw the conclusion through the experiments.Balancer can optimize data distribution to achieve a balanced data distribution. The more balanced data distribution, the shorter time of data access and running job. What's more, with the increase of the number of concurrent users and concurrent jobs, there is a growing influence of data distribution on file access time and job execution time.
Keywords/Search Tags:performance monitoring, balanced data distribution, data access, Hadoop
PDF Full Text Request
Related items