Font Size: a A A

Research And Implementation Of Data Compression In Distributed Storage

Posted on:2019-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:G Q ZengFull Text:PDF
GTID:2348330563953987Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the growth of companies' demand for high-performance storage,a large number of SSD-based distributed storage systems are beginning to be used.However,cost pressures have forced academia and industry to start researching how to use SSD-based storage systems with the cost of HDD hard disk storage while maintaining high performance.Among the hot researches,there are erasure code technology,deduplication technology,and efficient compression algorithm technology.However,the above-mentioned various compression schemes and compression algorithms have certain limitations.In view of the above issues,combined with the characteristics of distributed storage,this thesis designs a scheme for integrating elastic data compression in distributed storage clients.Contains dictionary manage module,detection module and compression and decompression module.Nearly all common distributed file systems support POSIX clients to read and write the entire storage system.Therefore,integrating data compression in the client can not only distribute the compression task to each client,it has good portability at the same time;secondly,it combines the characteristics of distributed storage,and optimizes the storage of small files by adding pre-dictionaries to the compression algorithm;through the detection module to more scientifically measure the compressibility of files.This design first detect whether the file is audio and video and picture files.If it is such a file,the compression will be directly omitted.Heuristic estimation algorithm is used to predict whether the file can be compressed for other types of files.Finally,the client load condition is combined to determine the compression level,making the entire system Guarantee a high compression ratio while still maintaining a high performance IO capability.Tests have shown that after the client integrates elastic data compression,when the data redundancy is set to be single in the distribute storage system,the storage of the small file increases the storage system IO throughput by about 70% while saving about 160% of the storage space;Large file compression saves approximately 120% of the storage space while increasing the IO throughput of the storage system by approximately 40%.In the case of multiple redundant settings,this effect is more noticeable.
Keywords/Search Tags:distributed storage, data compression, FUSE client, heuristic estimation algorithm
PDF Full Text Request
Related items