Font Size: a A A

Design And Implementation Of Distributed Storage Middleware For Small Files

Posted on:2020-09-16Degree:MasterType:Thesis
Country:ChinaCandidate:X S JiangFull Text:PDF
GTID:2428330590973247Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of the AI industry,AI has penetrated into every aspect of people's lives.In such a development environment,the amount of files that AI needs to train,such as pictures,corpus,audio,etc.,is increasing.Traditional stand-alone storage has been difficult to meet the industry file capacity requirements.Moreover,with the continuous development of algorithms such as deep learning,the accuracy of the algorithm is continuously improved,and the computational efficiency of the algorithm is continuously improved with the development of open source projects of various algorithms,and the influence of storage on the improvement of the efficiency of the algorithm becomes more and more obvious.In such an environment,file storage is connected to massive data and high performance issues.How to efficiently search,reduce management storage costs,improve reliability and high performance is a problem that current storage-oriented storage systems need to face.The implementation of the project is to solve the aforementioned problems and create a middleware based on ceph distributed storage solution.At present,the files stored by the company are mainly small files between 10 KB and 100 KB.The middleware SenseAgent is built with reference to the company's requirements and the current advanced open source technologies such as Redis/Codis,tikv,etcd.The middleware uses the current fast stable storage medium nvme disk and memory to realize the two modes of caching respectively,and realizes the basic service of file storage by using ceph.On distributed transactions,etcd is used as a solution for distributed locks.The metadata service of the file is realized by using tikv's open source project.The service provider uses thrift-rpc and go's gpm coroutine processing model to improve IO processing power for the server.The experimental results show that the middleware still has higher performance than memcache in combination with the company's existing algorithm framework.This paper first introduces the research status of distributed file storage systems at home and abroad,some characteristics of mainstream distributed storage systems,then describes the requirements analysis and architecture design of small file storage middleware,and then designs the small file storage middleware based on ceph architecture.The program and logic flow are introduced,and then the detailed design of the middleware is introduced in detail,including data structure design,class design,and main function module design.On this basis,performance tests were conducted for middleware and a simple comparison was made with memcache,and the results were summarized.
Keywords/Search Tags:Storage, distributed middleware, ceph, cache, high performance, rpc, metadata service
PDF Full Text Request
Related items