Font Size: a A A

Key Issues On Monitoring And Auditing Based On The Hadoop Platform

Posted on:2018-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:K WangFull Text:PDF
GTID:2348330512988028Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the application and expansion of large-scale data in different vertical business areas,Hadoop cluster has been used as the main tool for data storage and processing in an increasing number of enterprises.Because it has the advantages of high efficiency,scalability and low cost.However,due to the diversity and complexity of resource in Hadoop cluster,the nodes are vulnerable to fail.This brought great challenges to the resource monitoring.In addition,Hadoop's security mechanisms are relatively weak and they are static security technology,it lack of monitoring of user behavior activities,which made it vulnerable to hidden security threats.So it's difficult to guarantee the data safety.Aiming at the monitoring of user activitty,a method of anomaly detection in Hadoop cluster is proposed,which improves the data security of the cluster.In this thesis,an integrated monitoring framework is proposed to overcome the shortcomings of the existing framework through ample analysis and study.In the aspect of Hadoop cluster user activity monitoring,the shortcomings of the traditional principal component analysis algorithm are studied and analyzed,namely,the problem that the memory limitations and the processing efficiency is not high in case of large-scale data.In this thesis,we decompose the calculation of covariance matrix and parallelize the process based on MapReduce.Finally provide a good solution to above problems.Then,analyse the behavioral model of HDFS data operation and propose a method of abnormal behavior detection based on principal component analysis,and the behavioral model of HDFS data operation is simulated and extracted by using our algorithm.We determine whether the current behavior is abnormal or not by comparing the current user's behavior pattern and the historical normal behavior pattern obtained by the training,the metric is based on the Euclidean distance.This method reduces the redundancy of data features,improves the efficiency of data processing,and has better detection results.In the aspect of Hadoop cluster resource monitoring,the advantages and disadvantages of the existing monitoring framework are studied and analyzed.We use Ganglia to collect the monitoring metrics and extrct them to Nagios based on the data extraction module of this thesis,realize the function of displaying the hierarchical status,which is required by Nagios' s framework,finally put forward a framework with monitoring and alarm together for a cluster.This thesis takes full advantage of Ganglia and Nagios' s strengths,overcomes the lack of Ganglia's alarm function and the limitions of Nagios' s monitoring function,and this data extraction module makes Nagios avoid the overhead of self-monitoring service,so we realize a lightweight framework for monitoring Hadoop cluster.This thesis designs and realizes the resource monitoring framework and user abnormal behavior monitoring system of Hadoop cluster,and verifies the correctness and validity of the integrated framework and detection method finally.
Keywords/Search Tags:cluster monitoring, Principal Component Analysis, Audit Log, anomaly detection
PDF Full Text Request
Related items