Font Size: a A A

Research On Erasure Code Based Data Fault-tolerant Technology In Distributed Storage

Posted on:2016-03-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:W D SunFull Text:PDF
GTID:1318330536967116Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Since large scale distributed storage systems provide data access services for modern business computing,scientific computing,etc.,they have become the information infrastructure of modern society.With the expansion of the data volumes,modern distributed storage systems usually contain millions of storage nodes.Moreover,the data volumes reach PB scale or even EB scale.In this scenario,failures become normal behaviors rather than exceptions,which propose a new challenge: how to improve the capacity of data fault tolerance in distributed storage systems.Redundancy is the main technique for data fault tolerance enhancement.There are mainly two data reliability technologies of redundancy: replication and erasure codes.The replication based technology creates several replicas for each data object to improve the data reliability of the original data object.It is easy to be implemented.However,the huge storage overhead makes it difficult to be deployed in the storage system in modern big data era.Compared with the replication technology,the erasure-code based technology can dramatically reduce the storage space while providing the same level of data reliability,thus becomes an important research point in the distributed storage field.In modern large scale distributed storage systems,challenges on erasure-code based data reliability lie in the following aspects:(1)Most existing sequential coding approaches are inefficient,which hinders erasure codes from applying to distributed storage systems;(2)When some data blocks fail,the efficiency of data reading degrades severely,such that it is difficult to meet the user's data access request.(3)Numerous data volumes need to be transferred during repairing the failed blocks,which leads to high repairing cost.In this dissertation,we focus our research works mainly on the above challenges about erasure-code based data reliability for distributed storage systems.Existing sequential coding approaches of erasure codes are inefficient because they are limited to execute the high overhead calculations of Galois Field on a single CPU core.Parallelism is a key technology to improve the efficiency of erasure coding.However,there are a number of drawbacks in present parallel coding approaches:(1)Low applicability,whichi means that existing approaches are usually limited to the specific hardware platform or some specific erasure codes;(2)Low efficiency,which means that existing approaches do not deeply analyze the characteristic of I/O and coding procedure;(3)High thread scheduling overhead,which means that the default thread scheduling algorithm brings in high cost.To this end,a general parallel coding approach for erasure codes on multi-core platforms,called ParaErasure,is proposed in this dissertation.We firstly model the parallel coding procedure and propose a general multi-threaded parallel coding model adapted to all erasure codes,named as MTPerasure;Secondly,based on the MTPerasure model,two different parallel coding algorithms are proposed for different I/O environments.To reduce the high data synchronization latency between different threads under high I/O throughput,we propose a static data dividing based parallel coding algorithm,called sdaParallel.Through a static data dividing method,sdaParallel divides the original data object into multiple smaller parts,each of which is allocated to a single thread statically for data reading,coding,and writing.Thus,synchronization latency among multiple threads is eliminated under the high throughput environment,which greatly improves the coding efficiency.To reduce the high thread switch overhead under low I/O throughput,we propose a dynamic data dividing based parallel coding algorithm,called ddaParallel.Through a dynamic data dividing method,ddaParallel divides the original data object into multiple strips.It creates two threads to read and write these strips respectively,and multiple coding threads to dynamically encode/decode the pending data strips.These parallel threads significantly reduce the costs of switching threads and improve the coding efficiency.Moreover,an exclusive thread scheduling algorithm is proposed in ParaErasure to bind the coding thread to a CPU core as long as possible,so that the thread switch overhead can be further reduced.Compared with the original sequential approach,a thorough experiment evaluation indicates that the speedup ratio of ParaErasure achieves 1.3X under the low throughput I/O environment,and achieves 5X under the high throughput I/O environment,which greatly improves the coding efficiency.Existing data dividing approaches simply divide the original data object into several equal-sized data blocks.Thus the address-successive data strips are dispatched into the same data block.When a number of data blocks fail,the data reading operation needs to download much more data volumes from multiple active nodes than that of the reading operation requests,which results in high network traffic,large coding calculation overhead,and low data reading efficiency.To this end,we propose a strip-mapping based discrete data dividing approach,called d-dividing.In the d-dividing approach,a data object is divided into many fine-grained data strips.These data strips are then mapped to multiple groups,each of which represents a data block.To lower the network traffic and computation cost,the successive strips in the original data object are mapped to different blocks to reduce the data volumes and coding calculations of data reading operation.In the face of the failure of data blocks,d-dividing approach enables the reading operation to acquire multiple data strips together in a single decoding calculation,rather than only one strip in the tradition approach.Therefore,d-dividing approach significantly decreases the number of downloaded data volumes and decoding calculations,which brings high data reading efficiency.Compared with existing data dividing approach,the detailed experimental results indicate that,,when there are more than 2 failed data blocks,the d-dividing approach decreases about 50% of the transferred data volumes,reduces about 40% decoding calculations,and improves about 1X data reading efficiency.Link competition in existing parallel repairing methods of multiple data losses usually result in low repairing probability and data availability.To resolve this problem,we propose a grouping iteration based multi-loss parallel repairing approach,called GIMPR.GIMPR regards the repairing process for multi-loss as multiple iterated loops,each of which is composed of three steps.(1)Some nodes are selected from the failed nodes,and organized into a group that can be repaired in parallel.To increase the level of parallelism,we propose a greedy strategy based grouping construction algorithm,called GSGC,to increase the number of nodes in the group.GSGC inserts the failed nodes into the group according to a policy that the nodes belong to different data objects have high priority to be placed into a group.(2)A repairing topology is constructed for the nodes in the group.To reduce the repair cost,we propose a spanning tree based automatic repairing topology construction algorithm,called ARTC.ARTC chooses as many as possible providers to construct the topology to reduce the transferred data volume.(3)All failed nodes in the group is repaired in a parallel manner.The data are transferred from the leaf node to the root,and encoded according to regenerating codes in the inner-nodes of the tree to reduce the network cost.After all the nodes in one group complete the repairing process,another group is constructed to repeat next iteration.Meanwhile,the providers and replacers of the last iteration can be employed as providers in the next iteration to reduce the traffic overhead.Compared with traditional methods,the experimental results show that,GIMPR can improve the successful repair probability,data availability and repairing efficiency over 30%,30% and 50%,respectively.
Keywords/Search Tags:distributed storage, fault-tolerance, erasure codes, parallel coding, data repairing, data dividing
PDF Full Text Request
Related items