Font Size: a A A

Research On Real-time Monitoring And Diagnosis Of Abnormal Nodes In Hadoop Clusters

Posted on:2019-08-09Degree:MasterType:Thesis
Country:ChinaCandidate:S TianFull Text:PDF
GTID:2438330563957667Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In 2006,Google released the MapReduce distributed computing framework for finding out the most useful information among them.The framework can effectively divide big data into small data blocks evenly.Then distributing these data blocks to a single node in the cluster for calculation.Finally,it'll collect and calculate the results.The framework turns the serial computation into parallel which responds to the analysis requirements of massive data.Hadoop,one of implementations of MapReduce distributed framework,has been deployed by a large number of companies and organizations.However,with the increasing amount of data,small-scale clusters gradually can't meet the demand.In order to solve this resource bottleneck,more and more machines are added to the cluster to participate in the calculation.The more the number of nodes in the cluster,the higher the operation costs.It is difficult to locate and analyze the exception effectively when nodes' behavior in a Hadoop cluster had exception.Node crash error is easy to learn.Such problems which reduce the efficiency of the runtime instead of making the nodes crash directly is more difficult to locate.When the problems occur in the cluster,the progress of the work will be seriously delayed,leading to inestimable losses.Therefore,the sooner the problems in the cluster are found,the sooner the measures can be taken to intervene.In order to reduce the impact of such anomaly in actual production,this paper presents a method for detecting and diagnosing real-time anomalous node in Hadoop cluster.This method is based on the similarity of node behavior in normal state.In this paper,we extract relevant information about the status of the task from the Hadoop runtime log,transform the number of reduce tasks into the number of map tasks through execution time,and then analyze whether the node is normal by statistics.Once the exception occurs,the root cause location method is used to find out the root cause by collecting and analyzing the performance metrics of the operating system level.Based on the above two methods,real-time detection and analysis system for Hadoop cluster abnormal nodes is built using the spark streaming flow data analysis tool,which is used to prove the accuracy and efficiency of the proposed detection method.Because the execution time of map task varies with the size of tasks.In the first time,map task completion degree is used to evaluate the real-time performance of detection.Finally,we proved both the real-time and effectiveness of the method through a series of experimental tests.
Keywords/Search Tags:Hadoop, clusters, abnormal nodes, real-time detection, root-cause
PDF Full Text Request
Related items