Font Size: a A A

Design And Implementation Of Deep Learning Compiler Based On CNN Accelerator

Posted on:2021-07-24Degree:MasterType:Thesis
Country:ChinaCandidate:F F ZhangFull Text:PDF
GTID:2518306047986179Subject:Microelectronics and Solid State Electronics
Abstract/Summary:PDF Full Text Request
In the past decade,deep learning technology has improved the accuracy of image classification,speech recognition and object detection technologies to a practical level.Many technology giants and startups are involved in it,and striving to tap various possible application landing scenarios.Deep learning technology has begun to affect people's lives.Many hardware devices,including mobile terminals,embedded systems,microcontrollers and various AI accelerators,are important application landing platforms for deep learning.Different hardware platforms often have different characteristics,so it is difficult to get the best performance on all types of devices.Because in order to take full advantage of the capabilities of the hardware,it is necessary to optimize both the model and the computing core based on hardware.Therefore,at the current stage,developing software tools for various computing platforms and accelerators is one of the priorities in the field of deep learning applications.Without software tools,hardware cannot achieve its maximum energy efficiency and gain popularity.This article takes the challenges of deploying deep learning models on low-level embedded platforms and the need for software tools for AI chips as the starting point for research,conducts automatic deployment research of deep learning models.And implemented a compiler that can map deep learning models to hardware.The main research work of this article includes the following aspects: 1.Research the end-to-end optimized deployment of convolutional neural networks from top-level models to hardware architecture.Investigate the challenges of deploying deep learning models on low-end microprocessors.And proposed a deep learning compiler structure suitable for the SOC system with CNN accelerator.Compiler take ONNX as the input format,and based on ONNX design a relatively simple intermediate representation that is suitable for compiler to implementation and has sufficient representation ability.Compiler take intermediate representation as the object of model optimization and transformation,and finally converts the model into C functions.By taking advantage of the good portability of C language and the maturity of its tool chain,the transplantation of deep learning models to low-end devices can be simplified.2.Research the quantification method of neural network model.Quantization is currently the most widely used model compression method,and it is also an important model acceleration method.This article proposes a model quantization scheme according to the characteristics of the target hardware,and the entire process of model quantization is built into the compilation process,which further simplifies Model deployment.3.Design and implement a CNN accelerator-based compiler backend,and propose architecture-related calculation management,memory management,and code generation schemes.Research the CNN accelerator's convolution calculation acceleration principle and its limitations,and propose a Convolution implementation method.Tests show that the compiler implemented in this article can optimize,quantify,and generate code for the target architecture of pre-trained neural network model.Because deep learning and AI chips are both currently hot research fields,deep learning compiler that connects deep learning algorithms and their implementationhave also received more and more attention.By understanding the deep learning compiler,can not only deepen your understanding of deep learning,but also have a certain effect on further learning of other mature compiler frameworks.
Keywords/Search Tags:Convolutional neural network, Accelerator, Model deployment, Compiler
PDF Full Text Request
Related items