Font Size: a A A

Research On Convolutional Neural Network Acceleration

Posted on:2022-06-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:L H ZengFull Text:PDF
GTID:1488306323464184Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
In recent years,artificial intelligence has developed rapidly and has been applied in many fields.Especially in the field of computer vision,artificial intelligence tech-nology based on convolutional neural network models has achieved better results than traditional methods,and has made many breakthroughs.However,the commonly used convolutional neural network model consists of a dozen or even dozens of functional layers,and each layer needs to perform floating point calculations on the order of 100 million,which causes very large computational overhead.In order to solve this prob-lem,the acceleration of the convolutional neural network model has received more and more attention.It can be said that any method that reduces the amount of calculation,calculation time,or power consumption required for the calculation of the convolutional neural network model is a model acceleration method.Model acceleration is of great significance to the large-scale application of convolutional neural network models.The convolutional neural network model deployed in practical applications can divide the software components and hardware devices it uses into different levels.Each layer only solves a part of the problem,and the layers are relatively independent and interconnected through interfaces.At present,the waste of computing resources of the convolutional neural network model is mainly reflected in the following three aspects:At the numerical calculation level,most of the neural network models are based on floating-point arithmetic design,using 32-bit or 64-bit precision,but many tasks or among them Some algorithms do not need such high precision;at the level of parallel computing,many neural network models are designed only to enhance the learning ability during training and improve the final accuracy rate while ignoring the limitation of the model size.The model will have a huge amount of parameters and consume a lot of calculations;at the level of neural network models,most of the existing acceleration methods are still in theory,and there is not enough research on actual running time acceleration,especially for specific tasks.The research is still at a relatively early stage.Aiming at the existing problems of convolutional neural networks,this paper designs four acceleration algorithms from three levels.At the level of numerical calculations,multiplication operations consume more chip area and power than addition operations.In this paper,the original floating-point parameters are quantized into bit parameters,called bit planes,so that the multiplica-tion operation in the convolution operation can be removed to reduce the computational overhead.At the same time,this article uses multiple bit planes and performs fine amplitude adjustment to increase the accuracy of quantization.This method not only maintains the high information retention capability of floating-point features,but also reduces the computational overhead after deployment through a speci fic underlying de-sign without causing a significant drop in accuracy.At the level of parallel operation,this paper redesigns the existing convolution operation to reduce the amount of floating-point calculations while reducing the pa-rameters.The sparse convolution solution operation proposed in this paper can convert inter-space redundancy into inter-convolution core redundancy,and then use automatic pruning method to eliminate inter-convolution core redundancy.By removing the two kinds of redundancy,the method in this paper can achieve a higher accuracy rate than the existing method under the same acceleration rate.At the same time,this method can be deployed more conveniently with the existing framework.At the neural network model level,this article accelerates the model for specific tasks.Since this method is slightly different from traditional model acceleration,this method can be called time-efficient optimization.Aiming at the model redundancy problem of the stereo matching task,this paper proposes a cascaded residual decom-position method and an adaptive refinement method.This decomposition method uses the principle of linear approximation and feature multiplexing to accelerate the most time-consuming cost optimization part of stereo matching by reducing the dimension of the feature,which can greatly reduce the running time without causing a significant drop in accuracy.At the same time,the adaptive refining method uses the texture char-acteristics of the image to refine the rough matching results,and has a high degree of parallelism,which is beneficial to spend less running time on parallel computing equip-ment.Aiming at the problem of model redundancy in action location tasks using natural language,this paper proposes a new candidate model and feature fusion method based on convolutional networks.The candidate model directly generates high-precision can-didate time windows from multi-scale multiplexing features,which greatly reduces the number of candidate time windows,thereby reducing the amount of calculation.In or-der to improve the final retrieval accuracy,this paper proposes a feature fusion method that focuses on the moment,using a convolutional network to fuse features with a high degree of parallelism,which has outstanding effects and can run quickly on parallel computing devices.
Keywords/Search Tags:Deep learning, Convolutional neural network, Model acceleration, Network pruning, Matrix decomposition, Network quantization
PDF Full Text Request
Related items