Research On Hardware Acceleration Method Of Deep Convolutional Neural Network Based On FPGA

Posted on:2021-03-18

Degree:Master

Type:Thesis

Country:China

Candidate:Z W Huang

Full Text:PDF

GTID:2428330626963487

Subject:Circuits and Systems

Abstract/Summary:

PDF Full Text Request

Convolutional neural network(CNN)is the research focus and hotspot in the field of deep-learning artificial intelligence.In recent years,with the development of algorithm and hardware technology,CNN is promoting the process of information and intelligence of human society widely.Meanwhile,the huge amount of computation and data make the acceleration of CNN inevitable.At present,the computing acceleration platform for CNN is mainly divided into three categories: GPU,ASIC and FPGA.Although GPU has high development efficiency and good versatility,its high power consumption makes it difficult to exert advantages in embedded platforms.ASIC has high energy efficiency ratios,but its long development cycle and high development cost make it difficult to achieve a wild range of adaptations.The FPGA-based computing acceleration system combines the advantages of high development efficiency,good versatility of GPU,and high energy efficiency ratio of ASIC,making it more suitable for embedded applications.This paper aims at the application of CNN algorithm in embedded or edge computing devices,and takes its hardware computing acceleration as the goal,the following studies were carried out.Firstly,makes an in-depth study of CNN algorithm.Through the split,recombination and loop unrolling of CNN algorithm,the optimization methods for algorithm hardware acceleration are analyzed,and the research focus of the hardware system design is determined.Secondly,based on the advantages of FPGA,a bottom-level computation and memory optimization method for building computational acceleration system is proposed,including configurable fixed-point decimal operation module for system operations,serial-parallel conversion memory structure for data bit width matching,and dynamic depth configurable FIFO for feature maps with different sizes.Thirdly,a modular and configurable CNN hardware acceleration system is proposed.The system is based on SOPC software and hardware co-design.The convolutional and fully connection net input operation are accelerated by the configurable parallel pipeline multiply-accumulate module.The bias activation and pooling operation in the algorithm are accelerated by the configurable parallel activation pooling module.And a hardware design automation compiler is proposed for the automatic generation of the hardware acceleration system source and data files of the model.Finally,a CNN model for handwritten numerals recognition based on MNIST data set is designed,and the hardware acceleration system is tested on EP4CE115F29C7.When configured with 16 fixed-point decimals,4 groups of parallel calculation channels and 16-input multiplication accumulation trees,the system logic resource utilization rate and DSP utilization rate are 11% and 25%,respectively,which can run stably at 100 MHz,the peak data throughput rate reaches 12.4 GOPs,the test calculation speed is 24.26 times of the i5-6500 CPU,equivalent to GTX750 GPU.The accumulative error of output layer is less than 0.095 compared with the result of C language double precision.This computing acceleration system can effectively achieve CNN computational acceleration,with a high degree of configurability and portability,which is suitable for CNN computational acceleration of embedded platform.

Keywords/Search Tags:

Deep Learning, Convolutional Neural Network, FPGA, Parallel computing, Edge Intelligence

PDF Full Text Request

Related items

1	The FPGA Programmable Neural Network Processor Design
2	Research On Neural Network Computing Method For Edge Intelligence
3	Research On Edge And Cloud Collaborative Computing Model And Algorithm Based On Deep Neural Network
4	Research On Acceleration Of Deep Convolutional Neural Network Based On FPGA
5	Research And Application Of Image Recognition Algorithm Based On Deep Learning
6	Research Of Convolution Neural Network Acceleration System Based On FPGA
7	Research On Parallel Acceleration Design Of Convolutional Neural Network Based On FPGA
8	Research On Parallel Acceleration Architecture Convolutional Neural Network Based On FPGA
9	Design And Implementation Of A High-performance Accelerator Dedicated For Convolutional Neural Networks
10	Parallel Algorithm Of Convolutional Neural Network In Multi-GPU Environment