Font Size: a A A

Research On Edge Oriented FPGA Software Hardware Collaborative Convolutional Network Acceleration

Posted on:2024-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:C R ShuFull Text:PDF
GTID:2568307136990439Subject:Information networks
Abstract/Summary:PDF Full Text Request
Field Programmable Gate Arrays(FPGAs)have shown great potential in accelerating convolutional neural network inference.However,the increasing size of neural network models and the limited resources of edge FPGAs present a challenge.Firstly,traditional accelerators heavily rely on high-speed communication interfaces and abundant storage resources in server clusters,which contradicts the limited storage and bandwidth resources in edge devices,particularly the lack of support for shared storage structures.Secondly,the demand for multiplication resources in parallel computing conflicts with the insufficient multiplication resources available in edge devices.Lastly,the floating-point operations involved in convolutional computations are incompatible with the fixedpoint processing engines of FPGAs,resulting in low computational efficiency.To address these issues,this thesis investigates three aspects: software-hardware interaction at the edge,weight encoding,and mixed-precision optimization.The main contributions can be summarized as follows:(1)To tackle the inefficiency of shared storage mapping in edge scenarios,this thesis proposes a rapid shared storage mapping framework that achieves efficient mapping through bidirectional software-hardware storage scheduling.At the software level,a kernel-side storage mapping and signal control are employed to minimize latency and reduce redundant kernel copies.At the hardware level,a scheduler design is proposed,integrating interrupt,register mapping,and Direct Memory Access(DMA)for storage scheduling.Compared to the Python Productivity for Zynq(PYNQ),this framework achieves an average 2x increase in transmission speed and up to a 46.38 x improvement in small data transfer scenarios.(2)To address the issue of excessive multiplication resource consumption in neural network parallel computing,this thesis proposes a clustering hash method that shrinks the computation channels of input data,effectively reducing the multiplication resource consumption at the same level of parallelism.At the software level,the K-means clustering method is used to compress the variety of weights into the number of parallel computing input channels.At the hardware level,a hash function is utilized to add the inputs of weights belonging to the same class in a channel before entering the multiplier.Simulation and deployment tests demonstrate that this method can save over half of the multiplication resources and weight storage space under the same parallelism.(3)To address the mismatch between floating-point calculations and hardware fixed-point processing engines,this thesis proposes a mixed-precision quantization method that effectively improves computation throughput with minimal precision loss.At the software level,low-precision quantization is used as a foundation,and partial weight enhancement quantization is applied to improve inference accuracy.At the hardware level,a combination of precision computation matrix and matching quantization methods is introduced to enhance system throughput.Experimental results show that the accelerator employing mixed-precision quantization achieves a 0.76 x throughput improvement compared to similar work while maintaining a precision loss of less than 1%.
Keywords/Search Tags:Hardware-Software Co-Design, FPGA, Hardware Acceleration, Convolutional Neural Network, Quantization
PDF Full Text Request
Related items