Font Size: a A A

Research On Key Technologies Of Large-Scale Low-Power Data Storage System

Posted on:2020-10-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:M C ChiFull Text:PDF
GTID:1368330605481303Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of informatization level in the present society,data has become a critical resource for our daily work and life.At the same time,the scale of global data is increasing at a sharp rate.Statistics show that the total amount of data generated globally in 2018 is 33ZB,and it is expected to reach 175ZB by 2025.With the sheer growth in the volume of data,the demand for massive data storage is on a rapid increase.On the other hand,the problem of high-energy consumption caused by massive data storage becomes increasingly serious.According to statistics,the total annual power consumption of China's data centers in 2017 reached 130 billion kWh,which is far more than the total annual power generation of the Three Gorges Dam in that year(97.6 billion kWh).The energy consumption of storage devices accounts for 25-35%of the total energy consumption of data centers.In addition,the heat generated by storage devices aggravates the burden of cooling systems,which increases the energy consumption of the data center further.Therefore,how to reduce the energy consumption of storage systems effectively is of great significance to the environment.In data centers,there is only 10-15%of the total data that is frequently accessed,while the remaining data is cold data.Traditional servers would have to be high available,which makes hardware reliability and durability significantly important.Besides,in order to provide high performance,CPUs with high frequency are adopted,and hard disk drives are rotating at high speed all the time to satisfy low access delay and high data transfer rate.Storing cold data on the traditional servers seems to be unsuitable since these hardware configurations are over-provisioning under this circumstance.Therefore,this dissertation focuses on key technologies of a large-scale low-power data storage system.It aims to store cold data efficiently in the aspects of energy and cost,while the system still provides reasonable performance and high reliability and fault tolerance.The main contents and innovations of this dissertation are as follows:(1)Various levels of parallelism in high-performance CRC algorithms are investigated.As a result,multi-dataflow and multi-thi-ead parallel CRC algorithms are proposed,which make full utilization of modern processors from the perspective of instruction-level and thread-level parallelism respectively.Large-scale storage systems contain a large amount of complex software and hardware.The possibility of data corruption increases with the complexity of the system.Data corruption without being detected(i.e.silent data corruption)can cause unexpected errors,which affects the reliability of the system seriously.Cyclic redundancy check(CRC),which is a widely used method to check the integrity of data,has a good error detecting performance.In this dissertation,two parallel CRC algorithms are proposed to improve the performance further.First,a fine-grained algorithm executes the CRC computation in an interleaved manner,so that multiple independent data flows can be processed simultaneously.This algorithm allows instruction-level parallelism,which triples and doubles the performance of the existing Slicing-by-4 and Slicing-by-8 algorithms,respectively.Second,a coarse-grained algorithm can ideally deal with data in a parallel way by parallelizing a family of serial CRC generating algorithms.Therefore,this algorithm allows thread-level parallelism,which can make full use of multi-core computing capability.As a result,it achieves a speedup that is almost equal to the number of threads used.(2)Hardware-accelerated algorithms,which are based on Intel SSE and AVX2 instruction sets,are proposed to improve the performance of Reed-Solomon coding.Data redundancy is an effective method to improve the reliability and fault tolerance of the storage system.The original data can be recovered from redundant data when data corruption occurs.However,in order to improve the reliability of the storage system,additional costs are introduced by data redundancy,since that more disk space is required to hold the redundant data,leading to an increase in hardware costs and energy consumption.Therefore,it is of great significance for the storage system to decrease the portion of redundant data without sacrificing the reliability.Data replication is a simple and effective way to improve the reliability of data.However,it leads to heavy storage overhead and low storage efficiency.Compared with data replication,erasure coding technique produces less redundant data,while maintaining the same degree of reliability.The Reed-Solomon coding technique is investigated in this dissertation.As a result,two hardware-accelerated algorithms,named RS_SSE and RS_AVX2,are proposed,which are based on Intel SSE and AVX2 instruction sets respectively.SSE and AVX2 instruction sets allow Galois Field arithmetic to be performed parallelly so that the performance of Reed-Solomon coding is improved dramatically.The experimental results show that the coding efficiency of the RS_SSE and RS_AVX2 algorithms are 1.2 and 1.9 times of Jerasure library,respectively.(3)A health state prediction model of hard disk drives based on long short-term memory network is proposed,which utilizes historical data of SMART attributes to achieve high performance of prediction.With the increasing volume of data stored on the system,hard disk drive(HDD)failures become common cases,which affects the reliability of the storage system seriously.Effective prediction of HDD failures is conducive to more reasonable planning and management,which is of great significance for the data reliability of storage systems.Currently,almost all HDDs support SMART technology,which can monitor various indicators of the HDDs.In this dissertation,the time series data of SMART attributes are introduced into the prediction model,and a machine learning model based on long short-term memory(LSTM)is proposed to predict the health state of HDDs.The health state of HDDs is divided into various levels,and the proposed model predicts the health state based on the remaining useful life(RUL)of HDDs.Since the health state of the HDD is a gradual process from normal to faulted,multi-level health state prediction describes the transmission in a more detailed way.Furthermore,the proposed model has the ability to utilize the benefits of transfer learning to improve the prediction performance with a small-scale dataset.Experimental results show that,compared with the existing prediction models based on random forest and recurrent neural network,the proposed prediction model has better performance.(4)According to the characteristics of cold data,this dissertation designs and implements a large-scale low-power storage system.Since that a significant fraction of the data residing in the storage system is cold data,it has become a looming problem of how to optimally manage and maintain this data at a low cost.In this dissertation,we design and implement a large-scale,reliable,energy-and cost-efficient storage system.Different from traditional file systems,our file system aims to store cold data efficiently in the aspects of energy and cost.It is designed as a high-available distributed system,which is a cluster composed of a number of nodes with different roles:Metadata Nodes maintain the metadata of all files;Transfer Nodes play a role in encoding,decoding and caching;and Storage Nodes store encoded data for durability.Dedicated hardware is adopted for Storage Nodes to reduce power consumption.Each Storage Node is equipped with a low-power CPU and has a power supply control unit of HDDs.In addition,algorithms for namespace and disk space management are proposed to improve the performance of metadata operations.Experimental results show that the storage system achieves high performance in cold data storage.Due to the architecture we proposed,the vast majority of the HDDs,as many as 93.75%,can be powered off under the normal workload,and the average power consumption per TB of data is 0.92?1.09W.
Keywords/Search Tags:low-power, distributed storage system, Reed-Solomon coding, reliability, fault prediction
PDF Full Text Request
Related items