Research On CNN Network Acceleration For Image Classification Based On FPGA

Posted on:2022-11-29

Degree:Master

Type:Thesis

Country:China

Candidate:T D Guo

Full Text:PDF

GTID:2518306605998189

Subject:Electronics and Communications Engineering

Abstract/Summary:

PDF Full Text Request

The high accuracy of convolutional neural networks(CNNs)has made them widely deployed in the field of computer vision fileds,such as autonomous driving,human-computer interaction,and mobile robotics.However,the large model size and extremely high arithmetic power requirements have become the main bottlenecks limiting the deployment of convolutional neural networks in mobile application scenarios.Therefore,in recent years,several researches have been devoted to the design of lightweight networks and high-performance hardware accelerator.Depthwise separable CNNs(DSCNNs)represented by Mobile Nets,which greatly reduce the number of parameters and the amount of arithmetic,are favored by researchers and are deployed on GPU,FPGA,ASIC-based platforms.Among the aforementioned accelerator,FPGAs have become a highly sought-after research platform by virtue of their high reconfigurability.Previous FPGA-based CNN accelerator mostly focus on performance or implementation,and often over-reliance on the large resources and high bandwidth of advanced FPGAs,which cause a difficult backward compatibility problem.The main contribution of this paper is to design a scalable and lightweight FPGA convolutional neural network accelerator framework.First,starting from the key technologies of accelerators,this paper analyzes and determines the basic schemes of computing engine,data flow,control system and data quantization,and then designs a multi-size convolution computing engine that is compatible with various operations.Aiming at the problem that it is difficult to directly use layer fusion optimization methods for DSCNNs,this paper proposes a multi-directional fusion convolution calculation sequence design method.Based on this method,the proposed accelerator framework realizes partial convolutional layer fusion without caching the output feature data,which greatly reduces the on-chip memory requirements and off-chip memory accesses.In addition,the control system of proposed accelerator and the off-chip memory address space are optimized to make the framework more flexible and the off-chip memory more efficient.Finally,this paper presents a complete simulation of proposed accelerator,verifying from module to whole framework functions as expected,and evaluating the performance by means of board-level tests deploying multiple sizes accelerator on FPGA platforms with different resource and bandwidth.The evaluation results show that the proposed accelerator outperforms the CPU in terms of throughput rate even on platforms with lagging FPGA and resources.The test results on a better performing FPGA platform outperform the gas pedal research work in recent years and outperform the GPU in terms of power performance by more than 4 times.It is confirmed that the proposed accelerator framework in this paper can achieve both high computational performance and excellent scalability.

Keywords/Search Tags:

algorithm hardware acceleration, Convolutional Neural Network, depthwise separable convolution, FPGA, lightweight convolutional accelerator, storage optimization method

PDF Full Text Request

Related items

1	Accelerator Design And Research Of Depthwise Separable Convolutional Neural Network Based On FPGA
2	The Design And FPGA Verification Of A CNN Accelerator With Depthwise Separable Convolutions
3	Design Of Accelerator For MobileNet Convolutional Neural Network Based On FPGA
4	Depthwise Separable Convolutional Neural Network Structure Optimization For Embedded Systems
5	Design And Implementation Of Lightweight Convolutional Neural Network Accelerator On SoPC
6	Research On Hardware Acceleration Technology Of Convolutional Neural Network And Implementation On FPGA
7	A Convolutional Neural Network Accelerator Based On FPGA
8	The Research And Implementation Of Convolutional Neural Network Algorithm Accelerator Based On FPGA
9	Design And Optimization Of Convolution Array Accelerator Based On FPGA
10	Design And Implementation Of Convolutional Neural Network Hardware Accelerator