Design And Implementation Of A Neural Network Compiler Based On Heterogeneous Platform

Posted on:2023-12-16

Degree:Master

Type:Thesis

Country:China

Candidate:Y M Jiao

Full Text:PDF

GTID:2568307055454754

Subject:Deep Learning (Professional Degree)

Abstract/Summary:

PDF Full Text Request

In recent years,deep learning accelerators and deep learning models have become increasingly complex,and the computing power required for artificial intelligence application scenarios is also rapidly increasing.This puts higher requirements on smart software and hardware.At the same time,due to the chip’s card neck and edge computing power Most of the software and hardware technology giants and start-up companies have developed their own dedicated acceleration hardware,but if you want to further improve hardware performance utilization,improve the ease of use of smart chips to expand the market scale,they cannot do without chip software The development of the stack and the hardware architecture determine the highest computing power,and the actual performance is still determined by the higher-level compilation framework.In addition,a single hardware architecture cannot meet the computing requirements of operators required by complex application scenarios,and manual optimization is also more difficult.To this end,this paper studies the compilation technology of heterogeneous platforms,and aims to design a set of deep learning compilation optimization stacks to form a complete artificial intelligence software and hardware ecosystem to support the optimal deployment of models in different application scenarios.The main research work and contributions of this article include the following:(1)Designed and implemented a set of model end-to-end optimization stacks supporting heterogeneous platforms composed of dedicated convolutional neural network accelerators.(2)A new standard channel pruning algorithm is proposed,and the intermediate representation designed in this paper is adapted to reduce the engineering complexity of the compression algorithm in different deep learning frameworks.(3)Through the subgraph splitting technology,the operators/subgraphs not supported by the accelerator are unloaded and converted to the intermediate representation based on the TVM deep learning compiler for joint optimization.(4)Designed and implemented a visual interface front end,which greatly reduces the threshold for using the compiler.The experimental results show that the compiler(Tiangong Neural Network Compiler)designed in the article can optimize the model according to hardware characteristics and generate executable code in a heterogeneous system,which greatly improves the efficiency and effect of smart chip application development.And in the task of ship target detection,on general equipment,the speed is increased to 1.3 times when the accuracy loss does not exceed 1%,and the speed is increased to 1.6 times on the dedicated convolutional neural network accelerator.It can be deployed in deployment.Effectively accelerate the convolutional neural network.

Keywords/Search Tags:

convolutional neural networks, deep learning complier, model compression, Field Programmable Gate Array

PDF Full Text Request

Related items

1	ZYNQ-Based Reconfigurable Convolutional Neural Network Accelerator
2	Research And Implementation Of FPGA Accelerating Compressed Convolutional Neural Network
3	Research And Design Of Convolutional Neural Network Accelerator Based On Multi-FPGA Co-acceleration
4	Single Image Super Resolution Based On Deep Learning
5	Design And Research Of FPGA-based Deep Learning Accelerator
6	Research On Convolutional Neural Networks Accelerator Based On FPGA
7	SRAM Field Programmable Gate Array Design And Test Analysis
8	The Research And Implementation Of Convolutional Neural Network Based On FPGA
9	Research On Binarization And FPGA Acceleration Of Convolutional Neural Network
10	Application Research And Design Of Deep Neural Network Based On FPGA