Research On Edge Oriented FPGA Software Hardware Collaborative Convolutional Network Acceleration

Posted on:2024-01-04

Degree:Master

Type:Thesis

Country:China

Candidate:C R Shu

Full Text:PDF

GTID:2568307136990439

Subject:Information networks

Abstract/Summary:

PDF Full Text Request

Field Programmable Gate Arrays(FPGAs)have shown great potential in accelerating convolutional neural network inference.However,the increasing size of neural network models and the limited resources of edge FPGAs present a challenge.Firstly,traditional accelerators heavily rely on high-speed communication interfaces and abundant storage resources in server clusters,which contradicts the limited storage and bandwidth resources in edge devices,particularly the lack of support for shared storage structures.Secondly,the demand for multiplication resources in parallel computing conflicts with the insufficient multiplication resources available in edge devices.Lastly,the floating-point operations involved in convolutional computations are incompatible with the fixedpoint processing engines of FPGAs,resulting in low computational efficiency.To address these issues,this thesis investigates three aspects: software-hardware interaction at the edge,weight encoding,and mixed-precision optimization.The main contributions can be summarized as follows:(1)To tackle the inefficiency of shared storage mapping in edge scenarios,this thesis proposes a rapid shared storage mapping framework that achieves efficient mapping through bidirectional software-hardware storage scheduling.At the software level,a kernel-side storage mapping and signal control are employed to minimize latency and reduce redundant kernel copies.At the hardware level,a scheduler design is proposed,integrating interrupt,register mapping,and Direct Memory Access(DMA)for storage scheduling.Compared to the Python Productivity for Zynq(PYNQ),this framework achieves an average 2x increase in transmission speed and up to a 46.38 x improvement in small data transfer scenarios.(2)To address the issue of excessive multiplication resource consumption in neural network parallel computing,this thesis proposes a clustering hash method that shrinks the computation channels of input data,effectively reducing the multiplication resource consumption at the same level of parallelism.At the software level,the K-means clustering method is used to compress the variety of weights into the number of parallel computing input channels.At the hardware level,a hash function is utilized to add the inputs of weights belonging to the same class in a channel before entering the multiplier.Simulation and deployment tests demonstrate that this method can save over half of the multiplication resources and weight storage space under the same parallelism.(3)To address the mismatch between floating-point calculations and hardware fixed-point processing engines,this thesis proposes a mixed-precision quantization method that effectively improves computation throughput with minimal precision loss.At the software level,low-precision quantization is used as a foundation,and partial weight enhancement quantization is applied to improve inference accuracy.At the hardware level,a combination of precision computation matrix and matching quantization methods is introduced to enhance system throughput.Experimental results show that the accelerator employing mixed-precision quantization achieves a 0.76 x throughput improvement compared to similar work while maintaining a precision loss of less than 1%.

Keywords/Search Tags:

Hardware-Software Co-Design, FPGA, Hardware Acceleration, Convolutional Neural Network, Quantization

PDF Full Text Request

Related items

1	Software And Hardware Co-design Of Convolutional Neural Network Accelerator Based On FPGA
2	Acceleration System Design And Implement For Convolutional Neural Network Based On SOC FPGA
3	Research On Image Recognition Acceleration Method Based On FPGA Hardware And Software Collaboration
4	Design And Implementation Of A Configurable Convolutional Neural Network Accelerator Based On ZYNQ
5	Design And Optimization Of Shifted Convolutional Neural Network Based On FPGA Platform
6	Design And Application Of Convolutional Neural Network Accelerator For Image Classification Based On ZYNQ
7	Co-design And Implementation Of Hardware/Software Of Convolutional Neural Network Based On FPGA
8	Research And Implementation Of Convolutional Neural Network Accelerator Based On FPGA
9	Model Compression And Hardware Acceleration Of Convolutional Neural Networks
10	Research On Dynamic Quantization Algorithm Of Convolutional Neural Networks And Its Parallel Computing Structure