Research On Computing And Deployment Optimization Of DNN Models On BWDSP Platform

Posted on:2020-10-26

Degree:Master

Type:Thesis

Country:China

Candidate:J P Yang

Full Text:PDF

GTID:2428330572474168

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

In recent years,thanks to the large-scale improvement of computing power and data volume,deep neural networks have achieved performance beyond humans in the fields of image classification,target detection and speech recognition.At present,many researchers and developers have industrialized deep learning technology,and a series of hardware facilities such as mobile devices,embedded systems and many accelera-tors have gradually become the preferred platform for this technology.However,the computing power and storage space of these platforms are very limited,and directly deploying deep neural network models to such resource-constrained platforms will face many difficulties.In order to solve the problem of deep learning storage and computation,one so-lution is to upload the model to the cloud to process computing tasks.Therefore,the main task of the end device is to focus only on data communication,send images and receive results.However,this scheme not only depends on the stability of the network,but also leads to the leakage of personal privacy.Therefore,many researchers nowadays start from the two perspectives of computational efficiency and model compression to reduce the cost of computing and deploying deep learning models on end devices.The digital signal processor DSP has relatively strong computing power among many embed-ded processors,which is more suitable for processing computationally intensive tasks than other processors,such as deep learning.Aiming at this research topic,this the-sis proposes a series of optimization strategies,such as efficient computation and rapid deployment of DNN model based on embedded processor BWDSP,develops a deep learning computing library bwDNN and a compilation-deployment framework BWVM that maps workload from top description to lower BWDSP hardware platform.In sum-mary,main contributions of this thesis include:1.Implementing and optimizing the deep learning computation library bwDNN based on BWDSP,including Forward Computation module,Backward Computation module.Mathematical Operation module and Memory Management modules,etc.The experimental results show that the maximum computational performance of convolu-tional and deconvolutional computation on the BWDSP platform can reach 1 1.07GFLOPS,which is about 86.5%of the theoretical computation performance of the BWDSP plat-form.Meanwhile,the computational performance provided by the bwDNN is 1.96 to 2.30 times than that of the Intel Deep Learning Computing Library MKL-DNN under equivalent workloads and hardware resources.2.Designing a model compression strategy based on the constraints of model size and hardware resources.The strategy starts at a pre-trained model for compression,which is described and constructed by the top-level deep learning framework Tensor-flow.We integrate the model's size and target hardware resources,compute the com-pression ratio of each layer in the model,lead to a lightweight model after compression,so that the lightweight model could run efficiently on resource-constrained BWDSP.Experiments show that the model compression strategy proposed in this thesis is su-perior to other empirical strategies,such as unified compression,shallow strategy and deep strategy.The former is more accurate on CIFAR-10 dataset than experience-based compression rules within FLOPs reduction by 50%.3.Presenting BWVM:an automated compilation and deployment tool from top-level model description to back-end hardware,which includes analysis and reconstruc-tion of the deep learning framework model computation graph,graph optimization and and code generation of the target platform.The experimental results show that graph optimization and code automatic generation could effectively improve the code quality.Compared with the handwritten convolutional code in bwDNN library,the hardware resource utilization of the code generated by BWVM is increased by 7.3%and memory access overhead is reduced by 8.4%.In summary,this thesis uses the above three strategies to offline inference,deploy-ment DNN Models on edge devices and execute AI related tasks.It mainly studies how to run deep learning models and optimize computation methods to realize real-time rea-soning or model fine-tuning quickly.Meanwhile,the thesis also researches on software stacks from user interface,neural network intermediate representation,optimization and underlying code generation.

Keywords/Search Tags:

DNNs, Forward computation, Backward computation, Model compres-sion, Graph optimization, Compilation and Deployment

PDF Full Text Request

Related items

1	Performance Optimization Of Distributed Graph Computation Framework Based On BSP Model
2	I/O Method Based On Active Vertex For Semi-external Memory Graph Computation System
3	Research On Performance Optimization For Distributed Graph Computation
4	Research On Distributed Graph Computation With Topology Refactorization
5	An Optimization Mechanism For Asynchronous Incremental Computation On Dynamic Graph Processing
6	Research On Secure Multiparty Computation Of Set-Related Problems
7	Research And Implementation On General Layered Model Of Parallel Computation In Heterogeneous Environment
8	Research On Strong Forward And Backward Secure Dynamic Searchable Symmetric Encryption
9	Research Of Mind Evolutionary Computation Multi-modal Optimization Performance And Of Mind Evolutionary Computation Parameters Effecting Efficiency
10	Research On Graph Calculation Based On Xeon Phi Coprocessor