| In recent years,the world has witnessed the success of a wide range of computer vision tasks,e.g.,image classification,object detection,and video segmentation owe to the development of deep neural networks,especially convolutional neural networks.However,with the refinement of vision tasks,large CNN model appears,which impose heavy storage burden on training devices.It is unlikely to embed these large and high performance models into resource constrained platforms,which encourage CNNs models to have smaller memory and computation cost for fast inference without affecting the task performance.The deployment of CNNs in relative applications is mostly constrained by model size,run-time memory and number of computing operations.Since the computation cost of CNNs is mainly dominated by convolutional operation,which is exactly the dot-product between weights and activations.Thus the number of parameters in the model is critical to above mentioned three factors.This paper focuses on compression and acceleration of deep convolutional networks,and conducts related research work around tasks such as weighting and activation value quantization,channel clipping and sharing.The main research results of the paper include the following aspects:(1)A low-bit weight quantization method based on the least square method is proposed,which makes up for the problem of accuracy degradation that is common in traditional binary networks or ternary networks.This method uses the least square method to make the error between the low-bit weight and the original floating-point weight as small as possible.Experiments show that the proposed method not only achieves low-bit quantization of the network,but also implements channel clipping,thereby optimizing the network structure and further compressing the model size and reducing the computing power required for training process.(2)A sparse quantization method based on spectral clustering is proposed to quantify network weights while compacting the network structure.The proposed method not only obtains a low-bit quantization network,which reduces memory and calculation costs.At the same time,it could learn a compact structure(including a large number of sparse channels)from a complex convolutional network for subsequent channel pruning,thereby greatly reducing the amount of parameters and calculation consumption.(3)A novel convolutional neural network training framework is proposed.By combining weight quantization with Sparse Group Lasso regularization,channel pruning and weight low-bit quantization can be achieved at the same time.This framework is modeled as a discrete constraint problem and solved by the multiplicative alternating direction method(ADMM).Different from the previous methods,the proposed method not only reduces the model size and calculation operation,but also obtains a sparse and compact network structure.(4)A unified framework of sparse and shared weight channels is proposed.This framework combines quantization with ordered weighted l1 norm(OWL).In particular,unlike simple conventional parameter sharing and quantization methods,the proposed framework achieves both parameter sparsity and channel sparsity by setting appropriate hyperparameters.In addition,OWL can also identify channels with strong correlation in the four-dimensional weight tensor,and use the AP clustering algorithm to achieve channel sharing.(5)A full network quantization method based on Gaussian distribution and discrete state transition is proposed and address two corresponding fundamental issues.One is to approximate activations through low-bit discretization for decreasing network computational cost and dot-product memory.The other is to specify weight quantization and update mechanism for discrete weights to avoid gradient mismatch.With quantized low-bit weights and activations,the costly full precision operation will be replaced by shift operation.Not only the memory occupied by the model is reduced,but the computational consumption of the model will also be greatly reduced.(6)A channel slimming method based on loss function sensitivity is proposed.It identifies unimportant channels by using Taylor expansion technique regarding to scaling and shifting factors,and removes these channels at a fixed percentage threshold to obtain a compact network that consumes less parameters and FLOPs.In experimental section,we evaluate the proposed method in CIFAR datasets with several popular networks,including VGG-19,DenseNet-40 and ResNet-164,and experimental results demonstrate the proposed method is able to prune over 70%channels and parameters with no performance loss.Moreover,iterative pruning could be used to obtain more compact network. |