Font Size: a A A

Research On Energy Saving Of Hadoop Cluster Based On Neural Network Lstm

Posted on:2019-01-30Degree:MasterType:Thesis
Country:ChinaCandidate:Q L HanFull Text:PDF
GTID:2428330578972628Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of cloud computing,the scale and number of data centers are expanding,and the problem is that energy consumption becomes more and more expensive.However,Hadoop is a widely used data processing platform at this stage.In finance,education and other fields have become more prominent,and the deployment scale of data center is huge.How to reduce its power consumption can save cost and guarantee service quality,which is a hot problem in current research.Based on the practical application,this paper analyzes the principle of the YARN and HDFS data block storage of the traditional Hadoop cluster.Hadoop YARN mainstream scheduling strategy,paying more attention to the aspects of resource distribution,but ignores the dynamic changes of the clustering task processing,led to the waste of energy consumption caused by the long time in the condition of low load of the cluster nodes.However,data blocks stored in data nodes have a large proportion to become cold data,but these data occupy the storage resources of compute nodes.In view of the above problems,this paper based on the system structure and principle of Hadoop make the following work:(1)In this paper,the energy saving system scheme for Hadoop cluster is designed,including the data collection of the underlying cluster nodes,the energy consumption model of the middle layer,the prediction of the node load,and the operation scheduling of the upper layer.The scheme combines the advantages of open source tools and frameworks in each layer,so that the overall Hadoop scheme achieves better energy saving effect.(2)The workload of the Hadoop cluster is at a very low level in most cases,but the nodes are still running at a low load.In this paper,a task scheduling algorithm HES-scheduler based on node load state prediction is proposed.The algorithm is based on the number of tasks that are input by the cluster,and the nodes with lower load will be sleep,which is used to achieve energy saving.The scheduling process is mainly divided into two phases:first,the historical data of the LSTM training node is used to obtain the prediction model.At the same time,the model is used to predict the resource usage of the future cycle time of cluster nodes,and the nodes are divided into active node queue and sleep node queue by default threshold value.Secondly,operation scheduling is selected based on the principle of optimal energy consumption.In this paper,the comparison between the experiment and the FIFO of Hadoop YARN and the strategy of Capacity and Fair proves that this strategy has a good effect on energy conservation.(3)Based on the data block access rule,this paper improves the storage mode of the Hadoop cluster data block and puts forward the HES-storage data block storage strategy.According to the prediction state and preset threshold of the node,this strategy divides the cluster into the hot and cold region,and the hot region adopts the default storage strategy of Hadoop,which is conducive to improving the service quality.The cold region use centralized storage to increase its data block storage ability.The "cold" data in hot region will be timing migrate to the buffer queue of cold region,if the buffer queue data access frequency reach the threshold of node dormancy,it will be put in sleep queue.Finally,the change of the number of dormant nodes in a period of time is statistically analyzed,and the energy consumption value in the time period is quantified according to the energy consumption model,which proves the energy saving effect of this strategy.
Keywords/Search Tags:Hadoop cluster, LSTM, Energy saving scheduling, Energy saving storage
PDF Full Text Request
Related items