Font Size: a A A

Research And Implementation Of Storage Middle Layer For Deep Learning Platform

Posted on:2019-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:H Q ZhangFull Text:PDF
GTID:2428330563492460Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the maturity of artificial neural network algorithms and the improvement of machine computing capabilities,deep learning technology has made major breakthroughs in many fields.The maturity of the algorithm is inseparable from the support of big data,but massive data also poses a great challenge to the storage system of the deep learning platform.The current frequently used storage system cannot adapt to the complex application scenarios in the deep learning platform,making I/O become the bottleneck of the development of the deep learning platform.To solve the above problems,this paper proposes a storage medium layer solution for deep learning platform(SML-DLP).The bottom layer of this solution uses an object storage system as a reliable storage of data because the deep learning platform processes mostly flat unstructured data,and object storage has advantages over file systems.The traditional file system consumes a large number of inodes when storing large amounts of data,and the disk utilization is greatly reduced.The design of this paper makes every effort to reduce metadata of files and ensure that the access requests can perform all metadata queries in memory,allowing the system to free up more performance to read real data and increase the throughput of the system.SML-DLP meets the functional requirements and performance requirements of the deep learning platform from three levels: random processing layer,data caching layer and data processing layer.For the functional requirements,the random processing layer randomize the data set according to its file list to ensure the randomness of the data.For performance requirements,the small file is merged into a block of data when the data is written,and the read performance is optimized from both the batch read and the data cache.According to the SML-DLP scheme,we implement the system and compare with others.Experimental results show that the performance of SML-DLP is 3~4 times of the original Ceph when the data is written,and the performance of SML-DLP can reach the level of Memcached with full hit.When connecting to the training system,SML-DLP is close to the level of Memcached in terms of delay,which is far higher than the original Ceph system.
Keywords/Search Tags:Deep Learning Platform, Object Storage System, Massive Small Files, Metadata
PDF Full Text Request
Related items