Performance Optimization Technology For Data Reading Of Deep Learning

Posted on:2022-11-12

Degree:Master

Type:Thesis

Country:China

Candidate:Y F Wang

Full Text:PDF

GTID:2518306743951779

Subject:Master of Engineering

Abstract/Summary:

PDF Full Text Request

As a method of solving application problems such as images and texts through various machine learning algorithms on multi-layer neural networks,deep learning has been widely used in scientific and commercial fields.The current deep learning method still relies on data driven,and the larger the dataset used for training,the higher model accuracy can be obtained to a certain extent.In recent years,the data resources available on the Internet have become more and more abundant,and the volume of data sets that can be used for model training has also increased,which has posed new challenges to the I/O performance of storage systems.The traditional storage media and storage system bring serious I/O bottleneck to the deep learning training task;CPUs and GPUs have to spent large amount of time waiting for training data to be loaded into their memory units.To ease the I/O bottleneck,this thesis designs a set of data reading optimization techniques for deep learning tasks,which can provide excellent I/O performance improvement without requiring any modifications to the underlying storage media and storage system.This whole data-learning-oriented data read optimization consists of three novel designs:1)Re-layout techniques for deep learning datasets.This design improves the format and layout of the dataset to avoid massive reading on small files.This module can mainly help reduce the metadata management overhead of deep learning datasets and improve the reading efficiency of data sets.2)Dataset reading API design for deep learning training framework � This technique designs a set of data reading API into widely used frameworks,so training tasks running on the frameworks can gain I/O performance improvement without any modification to the module code.This thesis takes the Py Torch framework as an example to discuss the module function,which is mainly used to realize the deep learning framework and the new data set required in this paper,to achieve the reading function.3)Active storage framework that manages the lifecycle of a dataset� This technique offloads some data preprocessing tasks to storage nodes,so computing nodes can save more computing resource for training.With the other two modules,it can meet the requirements of shuffle reading and pre-processing random enhancement of deep learning training tasks,while providing excellent I/O reading efficiency.Based on the above three innovative designs,this scheme effectively improves the data reading in the process of deep learning and significantly shortens the training time.This paper uses the current mainstream image recognition model Res Net-50 as the target application,uses Image Net as the training dataset,and the experimental results show that the overall training time is shortened to less than one-thirds of the original.

Keywords/Search Tags:

Deep learning, Dataset, Active storage, I/O read optimization

PDF Full Text Request

Related items

1	Research On Performance Optimization Of Read/Write Interference Of Flash-based Storage System
2	A Hybrid Storage Engine Based On The Architecture Of Read/Write Separation
3	The Optimization Design Of Solid State Drives' Program/erase Cycles And Read/Write Speed Improvement
4	Active Learning Algorithm And Its Application In The Diagnosis Of Cardiovascular Disease
5	Research And Application Of Active Learning Method For Unbalanced Data Set Based On One Class SVM
6	Research And Application Of Optimization Technique Based On Redis For Information Storage
7	Research Of Comparison Deep Learning Libraries To The Problem Of Classification Of Handwritten Digits
8	Optimization Method For Federated Learning Model With Unbalanced Dataset
9	A Remote Sensing Image Classification Based On Active Deep Learning
10	Construction And Recognition Of Chinese Lip Dataset Based On Visual Information And Deep Learning