Font Size: a A A

Algorithmic Optimization And Efficient Deployment For Convolutional Neural Networks

Posted on:2021-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:H N WangFull Text:PDF
GTID:2428330647950679Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
In recent years,the deep learning technology,especially the convolutional neural network,has shown great performance in many tasks,significantly surpassing the traditional algorithm.However,deep learning methods often consume a lot of computing power and memory footprint.In the model training stage,it can rely on supercomputing or high-performance graphics processing unit to meet its computing power requirements.However,in the deployment stage for mobile and edge applications,it still need to deal with the constraints and tradeoffs between speed,accuracy and resources.More specifically,a powerful computing platform is needed to process a huge deep learning model at a fast speed.However,in the application scenarios where hardware/ power resources are limited,the CPU and high-performance GPU cannot be used as deployment platforms for deep learning algorithms,because of their huge power consumption.Hence,it is vital to design an efficient hardware accelerator architecture dedicatedly optimized for deep learning algorithms to achieve processing speed similar to that of high-performance platform under limited resources.Meanwhile,researches have shown that the deep learning models often preserve considerable redundancy,so compression methods can be used to significantly reduce the model size and to improve the model inferecne speed.But it also may introduce accuracy loss.So how to optimize the model compression rate and accuracy loss at the same time has become a research hotspot in this field.In addition,with the expansion of deep learning method in different application fields,the mainstream model compression methods may no longer be widely applicable to different fields.By combining with the data distribution in specific realms,the optimization algorithm will be proposed to compensate the accuracyloss brought by the compression methods,which is the trend of further development of deep learning.Therefore,this paper will focus on the above three aspects,aiming at some key problems and challenges in these areas,and put forward some innovative solutions.Firstly,this paper introduces our research works about model compression and hardware acceleration of model inference.The mainstream convolution neural network often contains millions of parameters,and the computing power demand of inference will reach billions of multiplication and accumulation operations,and convolutions account for more than 90 % of total operations.Therefore,in this paper,the cascaded Fast FIR Algorithm for convolution complexity reduction is introduced in CNN,which reduces the computing complexity by 41.5 % on average.But it should be noticed that the hardware architecture design of the Fast FIR Algorithm is often designed and optimized for one specific size of convolution kernel,and can not effectively support all kinds of mainstream size,resulting in a low reconfigurability or pool hardware efficiency of this kind of accelerator.Hence,a reconfigurable convolution acceleration unit is proposed based on the cascaded Fast FIR Algorithm,which can efficiently support convolution cores with different sizes,achieving 76.4 % hardware utilization efficiency in five mainstream sizes.And it supports all size convolution cores from 1 × 1to12 × 12.Besides,an efficient data flow is also proposed based on the convolution unit.Experimental results show that the proposed structure is superior to other advanced convolution accelerators in terms of reconfigurability and computation complexity.In order to further improve the deployment efficiency of CNN,this paper introduces the sparse Winograd algorithm which prunes the weights in Winograd domain,greatly reducing the computing load to 8%.Based on this algorithm,an efficient lowlatency CNN accelerator is designed.The proposed architecture innovatively designs binary mask indexing unit for sparsity weight and activation to skip all redundant multiplications,leveraging the advantage of workload reducing and clock cycle saving.Besides,an efficient asynchronous data fetching scheme is proposed,which can effectively alleviate the workload imbalance brought by sparsity in data.When compared with other state-of-the-art Winograd accelerators,our design significantly reduces the consumption of hardware resources,saving 50 % of DSP,29 % of Bram,41 % of LUT and 38 % of register resources respectively,and achieving a maximum delay reductionof 1.7 to 5.1 times.In addition,the model compression and hardware deployment scheme mentioned above is aimed at the 2D CNN models.With the development of computer vision,the2 D vision task becomes mature,and more works begins to do research on how to apply CNN to the 3D vision task.Due to the higher computation complexity of 3D tasks,the research of 3D lightweight model becomes particularly important,but it often leads to significant attenuation of model accuracy,so this paper further explores the joint optimization of algorithmic accuracy improvement and model compression methods.In the task of video recognition,a lightweight full separated convolution module is proposed,which can greatly reduce the model size and the computational complexity.Then,considering the distribution of video data,a temporal feature enhancement module base on the lightweight model is proposed,which can greatly improve the generalization performance with only sligt overhead.The experimental results show that,when applied to the state-of-the-art video recognition network,our method can bring 2.3 times model compression rate and significantly recover 7.9 % model accuracy loss.To sum up,our works take the efficient deployment of CNN as the critical target.Using methods of algorithm optimization,model compression and hardware architecture design,we puts forward solutions in all processes of efficient deployment.When compared with other state-of-the-art works,our works have achieved great performance improvement.
Keywords/Search Tags:Convolutional Neural Network, algorithm optimization, model compression, hardware acceleration
PDF Full Text Request
Related items