Font Size: a A A

Design And Implementation Of HDFS Optimization Method Based On Erasure Coding

Posted on:2019-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y P YouFull Text:PDF
GTID:2428330566995761Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Hadoop distributed file system,that is,HDFS,ensures the reliability of data through the default 3x replication,and the utilization of storage space is relatively low.As the size of the data grows rapidly,especially for cold data,the additional overhead associated with traditional HDFS storage becomes greater.Therefore,HDFS urgently needs a new optimized storage mechanism that can save storage space while ensuring data reliability.The HDFS Erasure Coding project came into being.It uses erasure codes to store HDFS files and recover lost data.Based on the existing problems of HDFS traditional storage and HDFS RAID technologies,Erasure Coding,striping layouts and hierarchical naming protocols are introduced in HDFS EC.Based on the analysis of existing problems,the optimization index of HDFS EC is proposed.By comparing the characteristics of the striping layout and the continuous layout in the HDFS EC encoding,striping layout was selected and a new file storage unit was designed for the striping layout.Based on this,a hierarchical naming protocol is used to locate the internal blocks and reduce the memory pressure on the Name Node side.For HDFS EC designed special reading and writing classes and auxiliary classes,both to reuse some of the existing HDFS read and write logic,but also to achieve the basic HDFS EC read and write processes.In order to ensure the richness and future scalability of codecs in HDFS EC,Erasure Codec,a codec framework with low coupling,pluggable and modular design,and other related classes are designed and the Reed Solomon codec implementation in Java.In addition,the introduction of Intel ISA-L codec library call implementation.Based on the above codec framework and algorithm,the implementation of data recovery technology in HDFS EC is described.Finally,the test results before and after using HDFS EC are given according to the optimization indicators of HDFS EC,and the test results are in line with the expectedindicators.HDFS EC not only ensures the reliability of the data but also reduces the storage overhead,which makes it more flexible and convenient for users to store hot and colddata.At the same time,HDFS EC also effectively solves the problem of small file storage,and the pluggable codec framework makes it easy for users to introduce custom codec technology.HDFS EC is of great importance for advancing HDFS in industrial applications.
Keywords/Search Tags:HDFS, Erasure coding, Pluggable, Striping layout, Hierarchical naming protocol
PDF Full Text Request
Related items