Font Size: a A A

Research On Acceleration Technologies For Convolution Computation In Deep Learning

Posted on:2024-07-21Degree:MasterType:Thesis
Country:ChinaCandidate:M ZhangFull Text:PDF
GTID:2568307127454604Subject:Electronic information
Abstract/Summary:
In the field of high performance computing,the acceleration technologies for convolution computation aim to accelerate the forward reasoning stage of the convolutional neural network model and is applied to the convolutional layer of various convolutional neural networks.In the forward reasoning stage,in order to learn more detailed image features,the convolutional neural network becomes deeper and more complex,which makes the amount of computation and parameter in the convolutional neural network increase exponentially.Convolutional neural networks involve a large number of intensive computations,which are difficult to deploy in embedded devices with low memory capacity and computing power.Therefore,people deploy such computations to run on high performance processor clusters.Nowadays,the intensive computing speed of convolutional layer in convolutional neural network is far from satisfying people’s needs.Therefore,the acceleration technologies for convolution computation is widely concerned.This paper mainly studies the acceleration technologies of convolution computation in deep learning,and successfully applies the acceleration technologies for convolution computation to three kinds of convolution algorithms in heterogeneous multi-core architecture chips.The research achievements are as follows.(1)Acceleration of direct convolution calculation based on heterogeneous multi-core architecture.The acceleration method of direct convolution computation is proposed.Firstly,the data access scheme is optimized,secondly,the acceleration technology of double buffered data is fused in the data loading stage,and the calculation and data communication are hidden by asynchronous loading.Finally,the pulsating array is used in the convolution computation stage to further accelerate the direct convolution computation.Multi-layer convolution layers of VGG-16 and Res Net-50 were selected to carry out multi-group experiments,and the experimental results all show the effectiveness of various acceleration techniques applied in direct convolution computation.For example,in VGG-16,compared with parallel direct convolution computation,the highest effective computational force reaches 126.65 GFLOPS and the highest acceleration ratio reaches 7.45 on the premise of ensuring correctness.(2)Acceleration of group convolution calculation based on improved Im2 col transform.Im2 col transform has many problems such as large memory consumption and slow rearrangement speed.Firstly,an improved Im2 col transform is proposed to solve these two problems.Secondly,a multi-core mapping method is designed to solve the problem of data allocation.Finally,in the aspect of intensive computation,a scheme of data parallel processing using SIMD(Single Instruction Multiple Data)vector instruction is proposed,which greatly improves the computational efficiency.In the range of accuracy error,the experiment was carried out on five convolution layers of Alex Net as test examples.Compared with the serial group convolution computation based on Im2 col transform,the highest effective computing force reached 186.71 MFLOPS and the highest acceleration ratio reached79.52.(3)Acceleration of Winograd convolution calculation based on heterogeneous multi-core architecture.As Winograd convolution is widely used in convolution computation,it is necessary to realize Winograd convolution spatial expansion.Firstly,SIMD vector instructions are used to extend the Winograd convolution in two-dimensional space to three-dimensional space.Secondly,in the four steps of Winograd convolution calculation,SIMD vector instructions are used to realize the parallel conversion of data.Finally,for the Hadamard product problem,the method of conversion to matrix multiplication is proposed,and the matrix multiplication is realized by pulsating array.Taking multi-layer convolution layers in VGG-16 and Fusion Net as test examples,the maximum effective computing force and maximum acceleration ratio of Winograd convolution calculation combined with various acceleration technologies reach 1915.67 GFLOPS and 314.27 on the premise of ensuring correctness.
Keywords/Search Tags:High performance computing, Deep learning, Convolution computing, Heterogeneous multi-core architecture, Acceleration technique
Related items