| With the rapid development of big data era,Convolutional Neural Network(CNN)based image recognition technology has been widely applied in various fields.CNNs typically require significant computational and storage resources to train and inference models,so running computationally and storage-demanding CNN algorithms in resource-scarce devices remains a challenging task.At present,most of the CNN hardware acceleration technologies based on Field Programmable Gate Array(FPGA)do not follow the collaborative design method,resulting in the mismatch between the top software algorithm and the underlying hardware.Therefore,based on the cooperation of software and hardware,this thesis designs two highperformance acceleration methods suitable for image recognition tasks by combining pruning and quantization software compression algorithms with FPGA hardware acceleration architecture design.The main research contents and innovations are as follows:(1)Research on hardware-software co-acceleration method for CNNs based on iterative pruning.To address the problem that CNNs are difficult to deploy on resource-constrained and power-sensitive edge devices,this thesis proposes a sparse neural network hardware-software collaborative accelerator based on FPGAs.Firstly,in order to reduce the storage space occupied by model parameters at the software level,a hardware-friendly software compression algorithm is designed.The dual compression of model parameters is accomplished based on two perspectives: iterative pruning algorithm and dynamic fixed-point quantization algorithm,and the compression algorithm is optimized more efficiently in combination with the hardware architecture.Then a novel sparse convolutional neural network accelerator architecture is designed at the hardware level,and the hardware acceleration scheme is combined with the characteristics of the software compression algorithm.The proposed sparse neural network acceleration method is implemented and deployed on Xilinx PYNQ-Z2 platform,and the advancedness of the proposed algorithm is verified by recognition experiments of Le Net,SENet,Alex Net,and FPNet image recognition models on several classical datasets.The results confirm that the proposed method achieves significant compression and acceleration ratios compared to the mainstream hardware acceleration methods at 100 MHZ operating frequency.(2)A hardware-software co-acceleration method for Mobile Net networks based on KL scattering quantization algorithm.In this thesis,a scalable and high-performance lightweight Mobile Net network accelerator based on Depth Separable Convolution(DSC)is proposed to achieve a highly scalable and energy-efficient acceleration of image recognition tasks.The KL scatter-based quantization algorithm is designed at the software level to compress the model weights and activations to 8 bits,reducing the computation and storage space of the model.At the hardware level,a highly flexible hardware accelerator architecture with a dynamically reconfigurable computation engine and a block convolution-based adaptive data flow scheduling model is proposed to provide a trade-off between hardware resources and processing speed.At the system level,this thesis can achieve the maximum computing performance by designing space exploration to obtain the optimal cyclic chunking configuration and reasonable resource allocation at each layer.Finally,a hardware accelerator based on a low-density FPGA(Xilinx ZYNQ)platform is implemented containing the proposed architecture and hardwaresoftware co-acceleration strategy.Higher frame rates and DSP computational efficiency are achieved compared to the mainstream FPGA hardware acceleration techniques.From the perspective of software and hardware collaborative design,this thesis designed two universal acceleration methods suitable for the field of image recognition,solving the problem of CNN with huge parameter quantities being difficult to deploy in mobile application scenarios in the current image recognition field.The hardware accelerators proposed in this thesis are reconfigurable and can be used for different image recognition tasks on low-density devices,which has a positive role in promoting the practical application of artificial intelligence. |