Font Size: a A A

Hadoop Platform Oprations Research And Implementation Of Key Technologies

Posted on:2016-07-20Degree:MasterType:Thesis
Country:ChinaCandidate:J X WanFull Text:PDF
GTID:2298330467491786Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of network interconnection, especially the arrival of the mobile Internet, the Internet crowd more and more, followed by producing a large amount of data, and the growth of the data in the form of index, For storage and calculation of these data will become very difficult, so the Hadoop arises at the historic moment, through the Hadoop HDFS and mapreduce can help enterprisesolve the problem of large data storage and computing. Hadoop open source as a distributed framework, sought after by many companies, many big companies Hadoop cluster scale has reached thousands, the Hadoop platform for effective operations, can greatly improve work efficiency.In this paper, several key techniques in the Hadoop platform operations, including the NameNode single point of failure problem, to monitor the Hadoop cluster, Hadoop platform running in Hive Sql tuning.At first,the single point of failure problem:the NameNode mainly through SecondaryNameNode+NFS way to solve, SecondaryNameNode mainly to the log file and image file merging, NFS is through a remote directory to the HDFS metadata for backup.Second,Hadoop cluster monitoring:is by developing a Hadoop cluster monitoring system, the monitoring system is divided into three modules, including Job monitoring module, HDFS monitoring module, node monitoring module.Job monitoring module includes running Job and its progress, failure of the Job and completed the Job. HDFS monitoring module mainly is the usage of HDFS cluster, including the use of the general HDFS cluster and the usage of each node in the cluster of HDFS.Every node in the node monitoring mainly includes the Hadoop cluster of the change of CPU and memory.Third,The Hive in the Hadoop platform to run the Sql tuning tasks: is through the analysis of the characteristics of the Hive statement and the characteristics of the Hive statements using data, according to the different situation, by setting the function on the map side or reduce side parameter, thus tuning the Hive Sql task.
Keywords/Search Tags:Hadoop, Hive, tuning and monitoring
PDF Full Text Request
Related items