Hybrid Fault Tolerance In Distributed In-memory Storage System

Posted on:2023-11-30

Degree:Master

Type:Thesis

Country:China

Candidate:Z Gong

Full Text:PDF

GTID:2568306929490684

Subject:Computer system architecture

Abstract/Summary:

With the broad emerging of data-driven business,in-memory storage system is increasingly becoming an important building block of data center.In-memory storage system provides sub-millisecond latency and improved concurrency of user application.Due to the volatile nature of memory and the expansive scale of deployment,data loss becomes prevalent.When data loss occurs,it’d be expensive to recover.Therefore,it’s indispensable to implement fault tolerance for in-memory storage system.Typically,there are two redundancy fault-tolerant,namely replication and erasure coding(EC).Replication has higher performance,while EC has higher storage efficiency.To reach a proper balance between time and space overhead,we could employ a hybrid fault-tolerant scheme that combines replication with EC in inmemory storage system.Moreover,some data have dynamic storage performance and efficiency requirements.Hence in-memory storage system is required to dynamically change fault tolerance scheme to adapt to changing demands for data storage.Thereout,we implement ElasticMem,a hybrid fault-tolerant system,based on Memcached,a popular distributed in-memory storage system.The research work and main contributions of this thesis are summarized into the following three aspects:(1)Hybrid fault tolerance and redundancy transitionWe implement ElasticMem,a distributed in-memory storage system employing a hybrid fault-tolerance scheme which incorporates both replication and EC.ElasticMem supports the flexibility to use replication or EC for each data to be stored,and can conduct redundancy transition to dynamically change the redundancy scheme of data,to adjust to the changing data storage requirements.(2)Design EC oriented replication to optimize redundancy transitionWe introduce Erasure Coding Oriented Replication(EOR)layout in ElasticMem.EOR determines the data placement of replication according to data layout of EC,which significantly reduces IO overhead of redundancy transition and improved its performance.At the same time,EOR can still provide as much fault tolerance as replication,and similar access performance.(3)Lightweight and efficient solution to concurrent consistency problemsFor data using block storage schemes such as EC,we point out potential consistency issue when accessing it concurrently and analyze its causes.We implement a table-based lightweight scheme in ElasticMem that detects out corelated concurrent read and write requests and schedules them to avoid consistency issues.In addition,we design data bypass to serve subsequent co-related requests with local data,which saves network overhead and improves access performance.Our testbed experiments show that ElasticMem reduces the redundancy transition time by up to 35%by leveraging EOR.Additionally,with data bypass,ElasticMem can reduce the latency of single request to at most 6us,and remarkably reduce overall latency of multiple concurrent requests with dependency.

Keywords/Search Tags:

distributed in-memory storage system, hybrid fault tolerance, replication, erasure coding

Related items

1	Research On Erasure Code Based Data Fault-tolerant Technology In Distributed Storage
2	Design And Implementation Of Distributed Video Storage Fault-tolerant System
3	Research On Distributed Fault-Tolerant Storage Technology Based On Erasure Code
4	Research On Erasure Codes Based Data Fault Tolerance And Repair For Mobile Distributed Storage Clusters
5	The Research And Implementation Of Distributed Storage System Fault-tolerance Mechanism
6	Reliability And Fault Tolerance Research Of Distributed Ocean Storage System
7	Research On Fault-tolerant Optimization Strategy Based On Erasure Coding In Distributed Storag
8	Study On Technology Of Data Error Tolerance And Disaster Tolerance In Network Storage
9	Study On Coding Workflow In Erasure-coded Storage Systems
10	Study Of Fault Tolerance Mechanism Based On Erasure Code In Distributed Storage Systems