Font Size: a A A

An Efficient Hybrid Update Mechanism In Distributed Storage Systems With Erasure Coding

Posted on:2020-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:Q LuoFull Text:PDF
GTID:2428330623959860Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Replication is widely used as a fault-tolerant mechanism in distributed storage systems.The performance of fault tolerance can be enhanced with the increase of the number of copy.However,with the advent of the era of big data,the amount of data increases quickly and the storage cost is high.In view of the shortcomings of excessive redundancy of replication,erasure coding has been gradually introduced into distributed storage systems as a new fault tolerance mechanism.A distributed storage system with erasure coding stores and recovers data by encoding and decoding computation.Importantly,erasure coding can save storage space.At the same time,how to more effectively protect important data and improve the security of distributed storage systems should be considered.Data storage and data transfer are two important parts of a distributed storage system that involve security issues.In a distributed storage system with erasure coding,data is encoded and stored in different storage nodes which enhances the security of the system to a certain extent,but the data is still easily leaked.For many real-world workloads in network file systems,data updates are common.However,most existing distributed storage systems such as HDFS,QFS and Ceph just support reading,writing and append writing without in-place updating.We complement the above studies by improving update efficiency in erasure-coded distributed storage systems.In addition,the generating matrix of erasure codes consists of the identity matrix and the coding matrix.The identity matrix is used to preserve the original data information,and the coding matrix is used to generate redundant information.This strategy can speed up reading and writing,but it cannot effectively guarantee the security of storage systems.In order to extend the current update mainstream in distributed storage system and address the problems of data security based on erasure codes,the main contributions of the thesis are concluded as following:(1)In view of the current situation of frequent updates and variable update scope in storage systems,a hybrid update mechanism aware of update size is proposed.The two RAID update mechanisms are mapped into distributed storage system based on erasure coding.Testbed experiments are conducted on different update mechanisms in WAN and LAN.The results show that EcDFS efficiently reduces update latency,especially in WAN,up to about 28.1% and 24.2% respectively lower than reconstruction writes and read-modify writes.(2)Through analyzing the security problems in distributed storage systems based on erasure coding,a two-parse security strategy is proposed.Then,the hybrid update mechanism under the two-parse security strategy is modeled and it is concluded that the rate of encryption and decryption should be increased as much as possible to adapt to the hybrid update mechanism.(3)Experiments are conducted on different open-source encryption libraries to compare and analyze the performance of encryption and decryption,which fill in a gap of the performance analysis on open-source encryption libraries and provide good support for the efficient prototype system.The results show that OpenSSL and Bouncy Castle support the kind pf symmetric encryption algorithms are numerous and the performance of OpenSSL is better.(4)An erasure-coded distributed system called EcDFS is further built with strong consistency among data chunks and parity chunks.The results of experiments show that write throughput of EcDFS is 2x faster than HDFS-RIAD and nearly 1.5x faster than HDFS.Its read performance is also 2x higher than HDFS-RAID and similar to HDFS with two nodes failures.In addition,the write throughput with wo-parse security strategy does not decline too much.The read throughput is 1.6x faster than HDFS-RAID while it is slightly lower than HDFS.
Keywords/Search Tags:Erasure code, Update mechanism, Distributed storage, Security
PDF Full Text Request
Related items