Font Size: a A A

Design And Implementation Of Energy Optimization System For Hadoop Cluster Based On Container

Posted on:2020-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:Z WuFull Text:PDF
GTID:2428330611999660Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of the network technology,the explosive growth of data in the network has prompted the emergence and development of a number of distributed systems dedicated to processing and storing large data,Hadoop Big Data Platform is one of the most famous.In recent years,people choose GPU to cope with the pressure of massive data.Since then,distributed systems,especially those with multiple GPUs,have become more and more important.While people enjoy the convenience brought by these distributed systems,their energy consumption has also become the focus of attention.Since energy saving was not taken into account in the initial design of Hadoop,the problem of excessive energy consumption after the operation of Hadoop cluster is more serious.Recently,Hadoop has proposed a solution of using GPU to process tasks,and the proportion of energy consumption of GPU in the total energy consumption of nodes can not be ignored,which aggravates the problem of excessive energy consumption of Hadoop cluster.Therefore,it is of great significance to study how to reduce the energy consumption of Hadoop cluster for improving the efficiency of Internet enterprises and even responding to the call of national energy conservation and emission reduction.In this paper,through in-depth analysis of the overall architecture and operation mechanism of Hadoop platform,Hadoop cluster is divided into HDFS cluster and YARN cluster.At the same time,each node of the cluster is encapsulated by Docker container technology,which is convenient for the operation of Hadoop node to achieve the purpose of energy saving.Then,based on the container orchestration platform Kubernetes,the creation of Hadoop cluster and node scheduling are realized.The newly created Hadoop node will be automatically scheduled to run on the appropriate host.At the same time,private warehouse is built to store mirror files.When Hadoop cluster is created,it only needs to pull mirror from private warehouse,which is convenient and fast.Finally,the monitoring module and auto-scaling module of Hadoop cluster are realized.The monitoring module can monitor the resource utilization of Hadoop cluster,such as CPU utilization and GPU utilization.The auto-scaling module can automatically adjust the number of Hadoop working nodes through the information obtained from the monitoring module,so that the load of the whole cluster remains stable,and the purpose of energy consumption optimization of the cluster is realized.In the end,the energy consumption optimization system of Hadoop cluster is implemented on three servers,and the functions of resource monitoring and node auto-scaling are tested.At the same time,compared with Hadoop cluster directly running on physical machine by traditional way,it is found that the system has little influence on the performance of Hadoop cluster.In addition,in order to verify the energy-saving effect of the system in large-scale cluster environment,simulation experiments are carried out with Cloud Sim simulation platform.The results show that the energy consumption optimization system implemented in this paper effectively reduces the energy consumption of Hadoop cluster.
Keywords/Search Tags:energy optimization, Docker, Kubernetes, Hadoop, GPU
PDF Full Text Request
Related items