Research On Customized And Collaborative Design Of Convolutional Neural Network Accelerator Based On FPGA

Posted on:2023-11-17

Degree:Master

Type:Thesis

Country:China

Candidate:J N Guo

Full Text:PDF

GTID:2558307100975809

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In the past decade,the development of computer architecture has greatly improved the computing performance,and the research of Convolutional Neural Network(CNN)has developed rapidly in an environment of sufficient computing performance.In the process of improving the intelligence level,it is a great challenge to optimize the design of CNN deployment system to meet the requirements of low latency,low power consumption and high performance on edge devices with limited resources.Compared with the CNN research platform Graphics Processing Units(GPU),Field Programmable Gate Array(FPGA)platform has the characteristics of highperformance computing capability and hardware reconfiguration,which is more suitable for the inference model of CNN deployment system.The current customized computing technology can design accelerators with hardware reconfiguration to deal with some practical applications.However,for the CNN application,due to the characteristics of a large number of parameters,large computation and high data dependence,the existing FPGA accelerator is difficult to implement and has poor adaptation to CNN algorithm,which leads to the problems of large computing gap,the waste of latency and low utilization of computing resources.It cannot meet the requirements of CNN application.In view of the above problems,the design of CNN accelerator with high adaptability and performance is researched by using customized computing technology.From the aspects of specialization computation-layer parallel optimization and codesign between layers,Compute Unit Design,Design Space Exploration and Collaborative Solution Design are studied.The main work content is summarized as follows:Multiple Size of Compute Unit(MSCU)is proposed.To solve the common redundant computation problem of CNN loop tiling accelerators,scheduling simulation is used to realize the optimal configuration of Compute Unit to deal with the redundant computation of feature data boundary cases in MSCU.Experimental results show that compared with the single Compute Unit accelerator,the average speed of MSCU is increased by about 1.43 times,and the redundancy calculation rate of MSCU is reduced by 30% on average.Ultra accelerator(Ultra Acc)based on Dataflow Architecture is proposed.Aiming at the high data dependence caused by local parameter sharing of CNN algorithm,the Compute Unit is customized in bottom-to-top under the resource limitation of FPGA platform,and the data flow structure is reorganized to adapt to the parallel optimization of CNN algorithm in Ultra Acc.In order to adapt to specific CNN applications and achieve the optimal performance of Ultra Acc,an evaluation model is established to explore the design space.Experiments show that the latency of Lenet with Ultra Acc reaches 192μs under 8bit quantization,31.9% faster than previous work.Customized soft/hard design Ultra Acc for DAC-SDC applications,which is the UAV object detection contest of Design Automation Conference(CCF A Conference).Collaborative custom application design(co-design)is the key to realize high performance of CNN accelerator based on FPGA.According to the characteristics of the DAC-SDC application,Ultranet-dp acceleration solution is designed with a customized neural network and a customized accelerator.The average throughput on Ultra96 V2 is 133.43 GOPs.Compared with CPU and GPU,the execution speed of FPGA is 6.39 times of CPU and 1.39 times of GPU.

Keywords/Search Tags:

Accelerator, Customized Design, Hardware-Software Collaboration Design, CNN, FPGA

PDF Full Text Request

Related items

1	Research On Image Recognition Acceleration Method Based On FPGA Hardware And Software Collaboration
2	Research On Software-hardware Co-design Method Of Homomorphic Encryption Accelerator Based On FPGA
3	Research On Detection Technique Of Unusual Behavior Of Workshop Workers Based On FPGA
4	Algorithm Of SVD Compressing Convolutional Neural Networks And Hardware Accelerator Design
5	Research On Software And Hardware Co-design Method Of Deep Neural Network Accelerator
6	Aac Decoder Based On The Hardware And Software Co-design, Development And Implementation
7	Design And Research Of Hardware Accelerator On H.264 Encoder Based On ARM ESL Platform
8	Research And Implementation Of Convolutional Neural Network Accelerator Based On FPGA
9	Research On Neural Network Accelerator Customization Method For Large-scale Reconfigurable Hardware
10	Design Of Hardware Accelerator Based On FPGA For Convolutional Neural Networks