Font Size: a A A

Design And Implementation Of A Distributed Storage Of Small Files Performance Optimization Strategies

Posted on:2017-11-19Degree:MasterType:Thesis
Country:ChinaCandidate:B T ZhangFull Text:PDF
GTID:2348330518494822Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the advent of big-data era and the rapid development of mobile Internet technology in recent years,online small files are enjoying an explosive growth.Therefore there are increasing demands for the storage of massive small files from enterprises.However,it is hard for existing mainstream distributed storage system to satisfy the storage demands of the massive small files.Ceph system,the new-type distributed storage system,is an optimal choice for the storage of massive small files.It is reliable,high-performance and highly-expansible without single point of failure.And meanwhile,it supports object-based storage,file storage,block storage and other storage modes.Although Ceph system is able to satisfy the storage demands of massive small files,yet there is great room for improvement in the storage performance of small files.Hence it is of great significance and value to study the optimization of small file storage performance of Ceph system.This paper analyzes existing optimization schemes to solve the storage performance issues of massive small files,especially the combining scheme of small files.However,due to the design flaw in index structure,the current combining scheme of small files leads to low retrieval efficiency and bad practicability of small files.Therefore,based on previous combining scheme of small files,a new combining scheme of small files is designed,which incorporates buffer management technique to realize a Ceph-based performance optimization system for small files.To summarize the major work of this paper:1.The background of and exiting solutions to the issues of massive small files in practical application are studied.The basic framework of the distributed storage system of Ceph and major functions of each module are in deep analysis.Through research on read/write procedure of Ceph system and analysis of the performance issues of Ceph system in storing massive small files,performance optimization strategies for small files suited for Ceph system are put forward.2.According to the storage features of massive small files,a combination algorithm for small files is designed,which will classify small files based on their size and type.Then,small files of the small category will be combines to a large file,thus effectively reducing the number of files.Meanwhile,a buffer management algorithm is designed on the client side,which will manage buffer memory space through the time interval and access frequencies for access to files,effectively improving the hit rates of files in the cache region of the client side.3.Based on the combination algorithm and buffer management algorithm in this paper,a Ceph-based performance optimization system for small files is designed and realized.The overall architecture of this system discussed in great details together with the analysis of the design principle and implementation procedure of each module.Finally,a testing environment for the system is established in Storage Lab and a comparison test for read/write performance is conducted between the optimized system and the system without optimization.The experiment result demonstrates a noticeable improvement of the optimized Ceph system in read/write performance of small files.
Keywords/Search Tags:Ceph distributed Storage file system, massive small files, file merging, buffer management
PDF Full Text Request
Related items