Font Size: a A A

Research On Erasure Codes Based Data Fault Tolerance And Repair For Mobile Distributed Storage Clusters

Posted on:2023-04-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y WuFull Text:PDF
GTID:1528307046956469Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of embedded devices,network communication technology and internet of things,distributed file systems are widely deployed to various mobile nodes with computing and storage capabilities,and distributed storage of data collected by nodes to other nodes through wireless network,forming mobile distributed storage clusters.In mobile scenarios,mobile nodes may fail due to external environment or internal faults.As a result,the reliability of storage clusters deteriorates.To ensure data reliability,a redundant fault-tolerant storage mechanism can be adopted.However,it creates a storage shortage problem because the storage space of the moving edge nodes is usually limited.The fault-tolerant mechanism of erasure codes can ensure data reliability with a small storage space overhead.However,erasure codes are subject to large data transfer volume and long recovery time during data restoration,resulting in high data reliability costs.Considering the diversity of mobile application scenarios,mobile clusters with different characteristics face different problems and challenges to ensure data reliability,which are as follows: 1)The low dynamic clusters with weak node mobility adopt erasure codes mechanism,which takes a long repair time and leads to data unavailability;2)In the highly dynamic clusters with high node failure rate and strong mobility,only adopting erasure codes mechanism is challenged with high reliability cost;3)The grouping clusters are suitable for erasure codes mechanism,and face the challenge of long repair time.Therefore,facing the problems and challenges of ensuring data reliability in mobile distributed storage clusters,this thesis focuses on data fault tolerance and repair technologies in the mobile clusters.Firstly,a predictive repair mechanism for low dynamic mobile distributed clusters has been proposed to reduce the predictive repair time.Secondly,a hybrid fault-tolerant method is proposed for high dynamic mobile cluster,which reduces the cost of ensuring cluster reliability.Finally,an erasure coded cross-group repair method is proposed for grouped mobile clusters,to reduce the repair time and improve the cluster repair performance.This thesis mainly carries out research work in the following three aspects.First,this thesis proposes a Lazy Fast Predictive Repair Strategy(LFPR)mechanism to solve the ptoblem of long predictive repair time.The mechanism is coupled with a lazy reconstruction and migration strategy.It employs the failure scenario partitioning mechanism for soon to fail blocks with different properties.Moreover,to ensure data availability,the repair queue determination mechanism of parity block delay repair and the repair queue determination mechanism of all block delay repair are proposed respectively for hot storage cluster and cold storage cluster.Finally,a predictive repair optimization mechanism is proposed to reduce the predictive repair time by improving the repair parallelism.Second,in view that existing redundant fault-tolerant schemes have the problem of high cost to ensure reliability in high dynamic clusters,this thesis proposes a Hybrid Fault Tolerance Strategy Combining Erasure Codes and Replicas(Mobile RE)to reduce the reliability cost rate.The storage reliability cost rate is defined based on the characteristics of the cluster.Then we propose a bandwidth range division strategy based on the faulttolerant mechanism,and we also propose a hybrid erasure code and replica storage mechanism based on network bandwidth to reduce the reliability cost rate.Finally,we propose an optimal storage mechanism based on fault tolerance technical parameters to minimize the reliability cost rate.Third,this thesis proposes a Fast Location-aware Repair Strategy mechanism(FLARepair)for grouped mobile clusters,to solve the problem that erasure codes faces large cross-group transmission traffic and long repair time.Based on the analysis of the optimal repair time problem,this thesis proposes a location-aware reconstruction set acquisition mechanism to minimize the cross-group repair traffic.In addition,we propose a cross-group dynamic repair mechanism based on the optimal middle decoding node to reduce repair time,to reduce repair time by selecting the optimal decoding node of the middle part in multiple reconstruction groups.We implement the proposed techniques in dynamic numerical simulations and static cluster simulations on real platforms.The experimental results show that the repair time of LFPR is reduced by 27.11% compared with the existing predictive repair mechanism.The reliability cost rate of Mobile RE is 33.27% lower than that of the single fault tolerance mechanism.Moreover,the proposed FLARepair reduces the repair time by 9.66%compared with the existing cross-group repair mechanism.
Keywords/Search Tags:Mobile distributed cluster, Distributed storage, Erasure codes, Data fault tolerance, Data recovery
PDF Full Text Request
Related items