In recent years,Convolutional Neural Networks(CNN)have made new breakthroughs in computer vision,natural language processing and other fields,greatly promoting the development of artificial intelligence technology.With the continuous improvement of performance,convolutional neural network plays a role close to or even beyond human beings in some fields,profoundly affecting our lives and benefiting mankind.However,the increasingly complex network structure makes the number of parameters and computation of convolutional neural network become huge,which is not conducive to its use in embedded terminals with low power consumption and limited hardware resources,thus hindering the further development of convolutional neural network to a certain extent.In view of the above problems,the compression and acceleration technology of convolutional neural network based on low-rank decomposition SVD from two aspects of algorithm and hardware is studied.In terms of algorithm,SVD(Singular Value Decomposition)in the low-rank decomposition method is used to compress convolutional neural networks VGG16 and Res Net50.By analyzing the structural characteristics of these two networks,the value method of SVD decomposition rank is optimized and the decomposition strategy of SVD is improved.The compression rate and acceleration rate of VGG16 and Res Net50 after SVD compression are more than 3 times,and the precision loss of Top-1 and Top-5 is less than 2.00% and 1.00% respectively after retraining of the compressed network.In terms of hardware,CNN hardware acceleration system based on SVD is designed.By analyzing the characteristics of CNN convolution and pooling after SVD compression,multiple weights are used in the convolution module and dimension reduction is calculated in the pooling module.The data storage method is optimized for efficient CNN operation,and the feature graph and convolution weight values of CNN are reasonably segmented.The convolution operation process is processed in parallel to realize multi-channel input and output calculation,and a high-performance CNN hardware acceleration system is designed.The system is validated on the Ultra96-v2 development board and the test results show that at the operating frequency of 200 MHz,the throughput is 87.20 GOPS,with power consumption 2.37 W and performance power consumption ratio 36.3GOPS/W,which meets the design requirements.This research has an certain reference value for the future research of low-rank decomposition algorithm in CNN model compression. |