Font Size: a A A

Research On Customized And Collaborative Design Of Convolutional Neural Network Accelerator Based On FPGA

Posted on:2023-11-17Degree:MasterType:Thesis
Country:ChinaCandidate:J N GuoFull Text:PDF
GTID:2558307100975809Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the past decade,the development of computer architecture has greatly improved the computing performance,and the research of Convolutional Neural Network(CNN)has developed rapidly in an environment of sufficient computing performance.In the process of improving the intelligence level,it is a great challenge to optimize the design of CNN deployment system to meet the requirements of low latency,low power consumption and high performance on edge devices with limited resources.Compared with the CNN research platform Graphics Processing Units(GPU),Field Programmable Gate Array(FPGA)platform has the characteristics of highperformance computing capability and hardware reconfiguration,which is more suitable for the inference model of CNN deployment system.The current customized computing technology can design accelerators with hardware reconfiguration to deal with some practical applications.However,for the CNN application,due to the characteristics of a large number of parameters,large computation and high data dependence,the existing FPGA accelerator is difficult to implement and has poor adaptation to CNN algorithm,which leads to the problems of large computing gap,the waste of latency and low utilization of computing resources.It cannot meet the requirements of CNN application.In view of the above problems,the design of CNN accelerator with high adaptability and performance is researched by using customized computing technology.From the aspects of specialization computation-layer parallel optimization and codesign between layers,Compute Unit Design,Design Space Exploration and Collaborative Solution Design are studied.The main work content is summarized as follows:Multiple Size of Compute Unit(MSCU)is proposed.To solve the common redundant computation problem of CNN loop tiling accelerators,scheduling simulation is used to realize the optimal configuration of Compute Unit to deal with the redundant computation of feature data boundary cases in MSCU.Experimental results show that compared with the single Compute Unit accelerator,the average speed of MSCU is increased by about 1.43 times,and the redundancy calculation rate of MSCU is reduced by 30% on average.Ultra accelerator(Ultra Acc)based on Dataflow Architecture is proposed.Aiming at the high data dependence caused by local parameter sharing of CNN algorithm,the Compute Unit is customized in bottom-to-top under the resource limitation of FPGA platform,and the data flow structure is reorganized to adapt to the parallel optimization of CNN algorithm in Ultra Acc.In order to adapt to specific CNN applications and achieve the optimal performance of Ultra Acc,an evaluation model is established to explore the design space.Experiments show that the latency of Lenet with Ultra Acc reaches 192μs under 8bit quantization,31.9% faster than previous work.Customized soft/hard design Ultra Acc for DAC-SDC applications,which is the UAV object detection contest of Design Automation Conference(CCF A Conference).Collaborative custom application design(co-design)is the key to realize high performance of CNN accelerator based on FPGA.According to the characteristics of the DAC-SDC application,Ultranet-dp acceleration solution is designed with a customized neural network and a customized accelerator.The average throughput on Ultra96 V2 is 133.43 GOPs.Compared with CPU and GPU,the execution speed of FPGA is 6.39 times of CPU and 1.39 times of GPU.
Keywords/Search Tags:Accelerator, Customized Design, Hardware-Software Collaboration Design, CNN, FPGA
PDF Full Text Request
Related items