Font Size: a A A

Performance Optimization Technology For Data Reading Of Deep Learning

Posted on:2022-11-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y F WangFull Text:PDF
GTID:2518306743951779Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
As a method of solving application problems such as images and texts through various machine learning algorithms on multi-layer neural networks,deep learning has been widely used in scientific and commercial fields.The current deep learning method still relies on data driven,and the larger the dataset used for training,the higher model accuracy can be obtained to a certain extent.In recent years,the data resources available on the Internet have become more and more abundant,and the volume of data sets that can be used for model training has also increased,which has posed new challenges to the I/O performance of storage systems.The traditional storage media and storage system bring serious I/O bottleneck to the deep learning training task;CPUs and GPUs have to spent large amount of time waiting for training data to be loaded into their memory units.To ease the I/O bottleneck,this thesis designs a set of data reading optimization techniques for deep learning tasks,which can provide excellent I/O performance improvement without requiring any modifications to the underlying storage media and storage system.This whole data-learning-oriented data read optimization consists of three novel designs:1)Re-layout techniques for deep learning datasets.This design improves the format and layout of the dataset to avoid massive reading on small files.This module can mainly help reduce the metadata management overhead of deep learning datasets and improve the reading efficiency of data sets.2)Dataset reading API design for deep learning training framework – This technique designs a set of data reading API into widely used frameworks,so training tasks running on the frameworks can gain I/O performance improvement without any modification to the module code.This thesis takes the Py Torch framework as an example to discuss the module function,which is mainly used to realize the deep learning framework and the new data set required in this paper,to achieve the reading function.3)Active storage framework that manages the lifecycle of a dataset– This technique offloads some data preprocessing tasks to storage nodes,so computing nodes can save more computing resource for training.With the other two modules,it can meet the requirements of shuffle reading and pre-processing random enhancement of deep learning training tasks,while providing excellent I/O reading efficiency.Based on the above three innovative designs,this scheme effectively improves the data reading in the process of deep learning and significantly shortens the training time.This paper uses the current mainstream image recognition model Res Net-50 as the target application,uses Image Net as the training dataset,and the experimental results show that the overall training time is shortened to less than one-thirds of the original.
Keywords/Search Tags:Deep learning, Dataset, Active storage, I/O read optimization
PDF Full Text Request
Related items