| In recent years,with the continuous development of deep learning research,network models represented by convolutional neural networks(CNN)have achieved great success in fields such as natural language,autonomous driving,and computer vision.However,these successes rely on huge amounts of parameters and computation,making it difficult for many practical applications to be implemented in specific scenarios.This poses two challenges.First,it can significantly reduce the amount of computation and parameters of the model without significantly reducing the accuracy of the model;Secondly,on resource constrained hardware platforms,accelerate the model and improve its performance.In the past period of time,people have explored many issues related to model compression and acceleration,but most of them are unilateral studies on compression or acceleration.In response to the above issues,this thesis focuses on the research of convolutional neural network model compression and hardware acceleration.The main work is as follows:1.Firstly,this thesis studies and implements a channel independent threshold pruning algorithm,which uses the scaling coefficient of the network BN layer as an evaluation indicator of the importance of the channel in the feature graph,tests the sensitivity of each network layer to pruning,sets an independent pruning threshold value for each network layer according to the sensitivity curve,and prunes the corresponding convolutional kernel and its connections.This algorithm has been tested on VGG-16 and Mobile Net networks.While the accuracy of the model has decreased by 0.24% and 0.6%respectively,the parameter amount for VGG networks has been reduced by 95.52%,the FLOPs has been reduced by 59.55%,the parameter amount for Mobile Net networks has been reduced by 83.83%,and the FLOPs has been reduced by 86.33%.2.For quantization schemes that actually deploy models to hardware,post training quantization(PTQ)algorithms are commonly used.Although the operation is relatively friendly and convenient,they have significant accuracy losses.This thesis designs a quantization method based on QIO(Integer Algorithm Only),which inserts pseudo random quantization nodes during training and uses integer computation during reasoning.This method can simulate actual deployment errors during software training.Experimental results show that the accuracy error does not exceed 1%.3.Finally,the compressed CNN accelerator is designed and validated based on FPGA.A specific array is designed to handle a large number of convolutional operations,and cyclic switching is used to increase data reuse to optimize data operations.During the data cache process,read and write parallelism is used to accelerate,while finite state machines and pipeline methods are used to control the execution process on FPGA.Then,use Intel’s FPGA devices and corresponding EDA tools to achieve code design and synthesis of the accelerator,and build a test platform to simulate and verify the accelerator.The experimental results show that the recognition accuracy error between the FPGA accelerator in this thesis and the recognition accuracy error of the quantized model in software is 0.61%,its performance is about 77.15 GOPS,and the energy efficiency ratio is 7.17GOPS/W.Compared with traditional hardware accelerators and FPGA accelerators designed in other literature,the FPGA accelerator designed in this thesis for compressed convolutional neural networks has achieved better acceleration effects. |