| With the rapid development of deep learning,deep neural networks based on convolution(CNN)have become a hot topic of research and application in academia and industry.The number of parameters of the network is getting bigger and bigger.The accuracy rate is getting higher and higher,and the application range is getting wider and wider.However,some mobile devices are limiting by CPU processing speed,memory size,and energy efficiency.It is necessary to reduce the size of the model as much as possible while ensuring the accuracy rate meets the conditions.Therefore,it is necessary to evaluate the quality of CNN from two aspects of accuracy and model size.This article uses knowledge density to measure the accuracy of CNN and model size.Knowledge density is the quotient of the amount of knowledge and the number of model parameters.The greater the amount of effective knowledge,the higher the accuracy rate.The smaller the number of model parameters,the smaller the running speed and the amount of calculation.This paper finds that filter grafting,model pruning,and knowledge distillation can affect knowledge density in different ways.There are invalid filters in the convolutional neural network.These filters do not contribute much to the output of the model but increase the cost of model deployment.Thus filter grafting method is proposed to improve the accuracy of the model.Filter grafting activates the invalid filter by grafting additional knowledge to the invalid filter.Filter grafting evaluates the quality of the filter through information entropy,and adaptively determines the proportion of grafted knowledge through a function.Filter grafting can improve the knowledge of the model without changing the structure of the model.Usually,the shape of the filters of the same layer in the convolutional neural network will remain the same.However,the different functions of the filters in each layer do not require all the parameters to achieve.Therefore,propose a module named filter skeleton to learn the shape of each filter.And a pruning method based on the filter skeleton is proposed.Pruning one by one no longer regards the filter as the basic unit,but divides the filter into many "stripes" according to its shape.By transforming the standard convolution into stripe-wise convolution,parameters of each stripe can be slim like filter pruning.And the model can still be accelerated in a structured manner.Pruning one by one can cut filter parameters while maintaining the knowledge of the filter.Knowledge distillation uses a large model or a combination of multiple models as a teacher network to extract knowledge from the dataset,and use this knowledge to supervise the student network so that the student network can learn from the teacher network with better accuracy.By analyzing a variety of methods,knowledge distillation with a fixed teacher can increase the amount of knowledge of the ective filter,thereby improving the accuracy of the model.These three methods are combined into a framework to improve the knowledge density of the model.Experimental results show that the framework can surpass other SOTA(state-of-the-art)pruning methods in terms of accuracy and parameter amount. |