Font Size: a A A

Study On The Energy-conserving Strategies Of File Storage For News Big Data

Posted on:2016-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:L YangFull Text:PDF
GTID:2308330479983254Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data, the data center which is as part of its basic service has been booming. The problem of the growth of energy consumption, environmental pollution, the consumption of land resources and other issues become inevitable behind the prosperity of big data industry. Because of power consumption multiplied, the cost of operation and management is also increasing. So how to reduce the energy consumption of the data center especially the consumption of server cluster has become the major issue of common concern in both academia and industry.This paper based on the news website which the amount of data is growing rapidly, and a few regularities has been found after analyzing the access logs of website in depth. Then these laws will be introduced into the Hadoop Distributed File System(HDFS), now the most widely used file system. And an energy-conserving, data distribution self-balanced HDFS has been proposed by optimizing the traditional system.Specifically, the following four policies were taken to achieve the purpose of energy-saving. Above all, the entire cluster is logically divided into Cold and Hot zone by the data node partition policy so that different management methods can be taken. Furthermore, there are two ways to implement the largest remaining space node matching policy according to the balance of data distribution problem. One is Active State Node Priority(ASNP, for short) matching strategy which can achieve better energy-saving efficiency and the other is the Lower than Average utilization rate Node Priority(LANP) matching strategy which can maintain a balanced data distribution. Thirdly, the file migration policy ensure efficient access to the news data in its popular period and lighten load of nodes in the hot zone at the same time. Last but not least, the nodes in the cold zone without tasks will be transitioned to standby mode to reduce the overall energy consumption by the node standby policy.In order to observe and analyze the results of energy-saving policies of file storage in HDFS, a simulation platform for the research of energy-saving HDFS has been developed. The platform can simulate scheduling process of news file is created and accessed and pluggable module of the energy-saving policies can be chosen by user. Finally, the platform will output the computational results.A month’s access logs of Wiki English-language news by data preprocessing has been selected as the test data set. The simulation results show that the improved HDFS with energy-saving policies can achieve 20%-34% energy cost reduction and the data distribution of cluster can be self-balanced if the LANP strategy is adopted. Additionally, more than 99.8% of the total read requests are not impacted by the policies and it proved the feasibility of the energy-saving policies this paper proposed.
Keywords/Search Tags:file storage, energy-saving policies, balanced data distribution, simulation platform
PDF Full Text Request
Related items