Research On Acceleration Technologies For Convolution Computation In Deep Learning

Posted on:2024-07-21

Degree:Master

Type:Thesis

Country:China

Candidate:M Zhang

Full Text:PDF

GTID:2568307127454604

Subject:Electronic information

Abstract/Summary:

In the field of high performance computing,the acceleration technologies for convolution computation aim to accelerate the forward reasoning stage of the convolutional neural network model and is applied to the convolutional layer of various convolutional neural networks.In the forward reasoning stage,in order to learn more detailed image features,the convolutional neural network becomes deeper and more complex,which makes the amount of computation and parameter in the convolutional neural network increase exponentially.Convolutional neural networks involve a large number of intensive computations,which are difficult to deploy in embedded devices with low memory capacity and computing power.Therefore,people deploy such computations to run on high performance processor clusters.Nowadays,the intensive computing speed of convolutional layer in convolutional neural network is far from satisfying people’s needs.Therefore,the acceleration technologies for convolution computation is widely concerned.This paper mainly studies the acceleration technologies of convolution computation in deep learning,and successfully applies the acceleration technologies for convolution computation to three kinds of convolution algorithms in heterogeneous multi-core architecture chips.The research achievements are as follows.(1)Acceleration of direct convolution calculation based on heterogeneous multi-core architecture.The acceleration method of direct convolution computation is proposed.Firstly,the data access scheme is optimized,secondly,the acceleration technology of double buffered data is fused in the data loading stage,and the calculation and data communication are hidden by asynchronous loading.Finally,the pulsating array is used in the convolution computation stage to further accelerate the direct convolution computation.Multi-layer convolution layers of VGG-16 and Res Net-50 were selected to carry out multi-group experiments,and the experimental results all show the effectiveness of various acceleration techniques applied in direct convolution computation.For example,in VGG-16,compared with parallel direct convolution computation,the highest effective computational force reaches 126.65 GFLOPS and the highest acceleration ratio reaches 7.45 on the premise of ensuring correctness.(2)Acceleration of group convolution calculation based on improved Im2 col transform.Im2 col transform has many problems such as large memory consumption and slow rearrangement speed.Firstly,an improved Im2 col transform is proposed to solve these two problems.Secondly,a multi-core mapping method is designed to solve the problem of data allocation.Finally,in the aspect of intensive computation,a scheme of data parallel processing using SIMD(Single Instruction Multiple Data)vector instruction is proposed,which greatly improves the computational efficiency.In the range of accuracy error,the experiment was carried out on five convolution layers of Alex Net as test examples.Compared with the serial group convolution computation based on Im2 col transform,the highest effective computing force reached 186.71 MFLOPS and the highest acceleration ratio reached79.52.(3)Acceleration of Winograd convolution calculation based on heterogeneous multi-core architecture.As Winograd convolution is widely used in convolution computation,it is necessary to realize Winograd convolution spatial expansion.Firstly,SIMD vector instructions are used to extend the Winograd convolution in two-dimensional space to three-dimensional space.Secondly,in the four steps of Winograd convolution calculation,SIMD vector instructions are used to realize the parallel conversion of data.Finally,for the Hadamard product problem,the method of conversion to matrix multiplication is proposed,and the matrix multiplication is realized by pulsating array.Taking multi-layer convolution layers in VGG-16 and Fusion Net as test examples,the maximum effective computing force and maximum acceleration ratio of Winograd convolution calculation combined with various acceleration technologies reach 1915.67 GFLOPS and 314.27 on the premise of ensuring correctness.

Keywords/Search Tags:

High performance computing, Deep learning, Convolution computing, Heterogeneous multi-core architecture, Acceleration technique

Related items

1	Research On Parallelization Of Machine Learning Algorithms For On-chip Heterogeneous Multi-core Systems
2	Study Of Heterogeneous Multi-core Acceleration Methods For Convolutional Neural Networks On Reconfigurable Platform
3	Research On Architecture Of Multi-core Processor For High-Density Computing
4	Research On High Performance Parallel Computing Architecture Based On FPGA+DSP
5	Design And Implementation Of Deep Learning Application Inference Execution Optimization System For Edge Computing Environment
6	Optimal Design Of Multi-core Heterogeneous Processors For High-density Computing Applications
7	Research On High-Performance Computing Supporting Technologies For Large-Scale3D Terrain Construction
8	Research On Sparse Matrix Multiplication Acceleration Technology For Sunway Architecture
9	Design And Research Of Deep Learning Heterogeneous Computing System Based On FPGA
10	Research On Parallelization Of Scientific Computing Kernels On Multi-core Platform