Font Size: a A A

Research On CNN Network Acceleration For Image Classification Based On FPGA

Posted on:2022-11-29Degree:MasterType:Thesis
Country:ChinaCandidate:T D GuoFull Text:PDF
GTID:2518306605998189Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
The high accuracy of convolutional neural networks(CNNs)has made them widely deployed in the field of computer vision fileds,such as autonomous driving,human-computer interaction,and mobile robotics.However,the large model size and extremely high arithmetic power requirements have become the main bottlenecks limiting the deployment of convolutional neural networks in mobile application scenarios.Therefore,in recent years,several researches have been devoted to the design of lightweight networks and high-performance hardware accelerator.Depthwise separable CNNs(DSCNNs)represented by Mobile Nets,which greatly reduce the number of parameters and the amount of arithmetic,are favored by researchers and are deployed on GPU,FPGA,ASIC-based platforms.Among the aforementioned accelerator,FPGAs have become a highly sought-after research platform by virtue of their high reconfigurability.Previous FPGA-based CNN accelerator mostly focus on performance or implementation,and often over-reliance on the large resources and high bandwidth of advanced FPGAs,which cause a difficult backward compatibility problem.The main contribution of this paper is to design a scalable and lightweight FPGA convolutional neural network accelerator framework.First,starting from the key technologies of accelerators,this paper analyzes and determines the basic schemes of computing engine,data flow,control system and data quantization,and then designs a multi-size convolution computing engine that is compatible with various operations.Aiming at the problem that it is difficult to directly use layer fusion optimization methods for DSCNNs,this paper proposes a multi-directional fusion convolution calculation sequence design method.Based on this method,the proposed accelerator framework realizes partial convolutional layer fusion without caching the output feature data,which greatly reduces the on-chip memory requirements and off-chip memory accesses.In addition,the control system of proposed accelerator and the off-chip memory address space are optimized to make the framework more flexible and the off-chip memory more efficient.Finally,this paper presents a complete simulation of proposed accelerator,verifying from module to whole framework functions as expected,and evaluating the performance by means of board-level tests deploying multiple sizes accelerator on FPGA platforms with different resource and bandwidth.The evaluation results show that the proposed accelerator outperforms the CPU in terms of throughput rate even on platforms with lagging FPGA and resources.The test results on a better performing FPGA platform outperform the gas pedal research work in recent years and outperform the GPU in terms of power performance by more than 4 times.It is confirmed that the proposed accelerator framework in this paper can achieve both high computational performance and excellent scalability.
Keywords/Search Tags:algorithm hardware acceleration, Convolutional Neural Network, depthwise separable convolution, FPGA, lightweight convolutional accelerator, storage optimization method
PDF Full Text Request
Related items