Research On Fast Recovery In Large-scale Storage System

Posted on:2020-02-23

Degree:Master

Type:Thesis

Country:China

Candidate:Z F Wang

Full Text:PDF

GTID:2428330626964593

Subject:Computer Science and Technology

Abstract/Summary:

Nowadays,the scale of distributed storage system grows rapidly.If every device in storage system have a constant possibility to fail,due to the growing number of devices in system,the data durability and availability in large-scale storage system become lower.A fast data recovery rate can enhance system durability and availability in such systems.But as the system is providing service to its customers,if accelerating the data recovery blindly,it will introduce interference to the foreground traffic,which degrades the performance of both sides and wastes precious bandwidth resources.Therefore,a fast and low-interference data recovery approach should be proposed.To this end,this thesis explores the method accelerating data recovery in a large-scale storage system with minimal interference to foreground traffic.Based on the observation from production system,this thesis finds why existing approaches fail to produce good recovery plan,and designs a timeslot-based centralized scheduling framework.To achieve high performance of such a centralized scheduler and enhance the scheduling quality,this thesis proposes a series of key techniques to realize high scheduling quality and speed based on observation.With these designs,the protocol proposed by this theses succeeds to bring a fast recovery speed as well as low interference to the foreground.The main contribution of this thesis includes:(1)By investigating I/O and failure traces from a real-world large-scale storage system,this thesis finds that because of the scale of the system and the imbalanced and dynamic foreground traffic,on the one hand,no existing recovery protocols can generate a high-quality recovery strategy in a short time.On the other hand,when node fails there are massive chunks to be recovered and large number of candidates as helper,sophisticated scheduling algorithms fail to produce result in a short time.(2)Based on our observation,this thesis proposes Dayu,a timeslot-based recovery protocol,which only schedules a sub-set of tasks which are expected to finish in one timeslot: this approach reduces the computation overhead and can naturally cope with the dynamic foreground traffic.In each timeslot,Dayu incorporates four key algorithms,realizing fast and high quality scheduling.(3)Dayu is implemented based on Pangu and tested both on real-world cluster and in simulation environment.The evaluations in a 1,000-node real cluster confirm that Dayu can outperform existing recovery protocols,achieving high speed and low interference.The evaluations on 25000-node simulation confirm Dayu has good scalability.

Keywords/Search Tags:

large-scale storage system, data recovery, scheduling, fast and low-interference

Related items

1	Fast Analysis Of Large-scale Wafer Inspection Data
2	Design Of Large-capacity Storage Module For 12bit High-speed Data Acquisition System
3	The Research On Data Placement And Schedule Scheme In Large-Scale Multimedia Storage System
4	A Study On The Transformation Performance Improvement Of Cloud Storage System
5	An Analytical System For Large Scale Semantic Data
6	Research On Data Equalization Method Of Large Scale Erasure Code Storage System
7	Research On High-efficient Data Transmission Techniques In Large-Scale Distributed Erasure-Coded Storage Systems
8	Research On A Topology-based Multilevel Algorithm For Large-scale Task Scheduling In Clouds
9	Study On Large-scale Workshop Scheduling Problem In Discrete Manufacturing Enterprise APS
10	Research On User Scheduling And Precoding For Large-scale Antenna Systems