Font Size: a A A

Research And Implementation Of Distributed Erasure Coded Storage System For Hot Data

Posted on:2021-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2428330620968182Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,the world has entered the era of big data,and massive data is generated every day,which makes the storage overhead of distributed storage systems larger and larger.This situation is exacerbated by the redundancy mechanism in distributed storage systems to ensure high availability of data.There are two main mechanisms of data redundancy in distributed storage systems,one is multiple copies and the other is erasure codes.Compared with multiple copies,erasure codes use specific coding rules to generate a small amount of redundant data,which greatly reduces the storage overhead.However,due to the complex rules of erasure codes,the operation of read,write and update in distributed storage system will consume more CPU,network I/O and hard disk I/O resources,which will lead to the high latency of corresponding operation.Therefore,erasure codes are mainly used to store cold or warm data to reduce the storage overhead.Hot data that needs frequent access and update is still stored in the form of multiple copies to ensure the operation performance.Aiming at the problem of large latency in erasure code storage systems in hot data storage scenarios,this paper designs a log structure-based storage strategy,LSEC(Log-Structured Erasure Coding),which combines multiple copies and erasure codes from the perspective of system architecture to meet the performance requirements while im-proving the storage efficiency of the system.(1)Given the high latency of write and update operations in erasure coding under hot data storage,we design a new storage management strategy,LSEC,based on log structure,combined with replication and erasure coding.It makes use of a non-volatile buffer to temporarily persist data,ensuring data persistence and low response latency of request,and improves storage efficiency through asynchronous erasure coding.(2)To reduce the performance impact of frequent GC on log-structured storage,we propose a partition GC method,which partitions storage nodes according to the stripe granularity to perform GC locally and further improve the performance of the system.(3)We build a prototype that implements the relevant strategies we propose.Ex-perimental results show that the proposed LSEC strategy can reduce the write/up-date latency by ?1.7x to ?20x,compared with DRAM-based erasure-coded stor-age systems and SSD-based storage systems using multiple replications.The result also shows that Partition GC strategy effectively reduces the impact of GC activi-ties on the overall performance of the system.
Keywords/Search Tags:erasure coding, multiple replicas, data update, distributed storage sys-tems, hot data
PDF Full Text Request
Related items