Font Size: a A A

Research On Hardware Acceleration Method Of Deep Convolutional Neural Network Based On FPGA

Posted on:2021-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:Z W HuangFull Text:PDF
GTID:2428330626963487Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
Convolutional neural network(CNN)is the research focus and hotspot in the field of deep-learning artificial intelligence.In recent years,with the development of algorithm and hardware technology,CNN is promoting the process of information and intelligence of human society widely.Meanwhile,the huge amount of computation and data make the acceleration of CNN inevitable.At present,the computing acceleration platform for CNN is mainly divided into three categories: GPU,ASIC and FPGA.Although GPU has high development efficiency and good versatility,its high power consumption makes it difficult to exert advantages in embedded platforms.ASIC has high energy efficiency ratios,but its long development cycle and high development cost make it difficult to achieve a wild range of adaptations.The FPGA-based computing acceleration system combines the advantages of high development efficiency,good versatility of GPU,and high energy efficiency ratio of ASIC,making it more suitable for embedded applications.This paper aims at the application of CNN algorithm in embedded or edge computing devices,and takes its hardware computing acceleration as the goal,the following studies were carried out.Firstly,makes an in-depth study of CNN algorithm.Through the split,recombination and loop unrolling of CNN algorithm,the optimization methods for algorithm hardware acceleration are analyzed,and the research focus of the hardware system design is determined.Secondly,based on the advantages of FPGA,a bottom-level computation and memory optimization method for building computational acceleration system is proposed,including configurable fixed-point decimal operation module for system operations,serial-parallel conversion memory structure for data bit width matching,and dynamic depth configurable FIFO for feature maps with different sizes.Thirdly,a modular and configurable CNN hardware acceleration system is proposed.The system is based on SOPC software and hardware co-design.The convolutional and fully connection net input operation are accelerated by the configurable parallel pipeline multiply-accumulate module.The bias activation and pooling operation in the algorithm are accelerated by the configurable parallel activation pooling module.And a hardware design automation compiler is proposed for the automatic generation of the hardware acceleration system source and data files of the model.Finally,a CNN model for handwritten numerals recognition based on MNIST data set is designed,and the hardware acceleration system is tested on EP4CE115F29C7.When configured with 16 fixed-point decimals,4 groups of parallel calculation channels and 16-input multiplication accumulation trees,the system logic resource utilization rate and DSP utilization rate are 11% and 25%,respectively,which can run stably at 100 MHz,the peak data throughput rate reaches 12.4 GOPs,the test calculation speed is 24.26 times of the i5-6500 CPU,equivalent to GTX750 GPU.The accumulative error of output layer is less than 0.095 compared with the result of C language double precision.This computing acceleration system can effectively achieve CNN computational acceleration,with a high degree of configurability and portability,which is suitable for CNN computational acceleration of embedded platform.
Keywords/Search Tags:Deep Learning, Convolutional Neural Network, FPGA, Parallel computing, Edge Intelligence
PDF Full Text Request
Related items