Research On Software And Hardware Optimization Of Convolution Operation In Neural Network

Posted on:2022-03-10

Degree:Master

Type:Thesis

Country:China

Candidate:H Qi

Full Text:PDF

GTID:2518306323962439

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

In recent years,with the rapid development and wide application of deep learning technology,the depth of the neural network model increases,the amount of calcula-tion and memory access are also increasing,this brings huge challenges to computer hardware design and software optimization.Convolutional neural network is one of the representative algorithms in the field of deep learning,convolution is an intensive com-putation and memory access operation,and convolution layer accounts for more than 90%of the computation time of the whole convolutional neural network.Therefore,optimizing convolution operation is very important to accelerate the operation of deep learning algorithm.As mobile devices are limited by both computing power and power consumption,many lightweight networks have emerged,such as Xception,MobileNet series,etc.In these lightweight networks,a new type of convolution has emerged,called depthwise convolution.The channel of the input feature map of the depthwise convo-lution and the channel of the output feature map have the same size and one-to-one correspondence.There is no accumulation operation between the channels of the input feature map.The number of layers of depthwise convolution accounts for 31%?50%of all convolutional layers in the network.Therefore,how to optimize the operation of depthwise convolution is also a problem worthy of study.Optimizing the operation of depthwise convolution and normal convolution is very important for accelerating the operation of neural networks,This dissertation performs software and hardware co-optimization on depthwise convolution and normal convolu-tion respectively,the main contributions are as follows:1.For general-purpose CPUs and fixed vector length SIMD processors,neither of them can efficiently handle various scales of deep convolutions in neural networks,and the performance is low,this dissertation proposes a hardware architecture design with multiple weight transmission modes.Combining optimization methods such as soft-ware mode selection,data splitting,soft pipelining,and data multiplexing,it improves computing efficiency while reducing the amount of memory access.Experimental re-sults show that when implementing depthwise convolution in classic neural networks,compared with Intel i7-8850H CPU and a single-core SIMD processor with a vector length of 64,the work described in this dissertation can increase the performance by up to 9.3 times and 29.3 times,respectively.2.In view of the general GPU and fixed data residency mode and transmis-sion mode accelerators are unable to efficiently deal with the problem of ordinary convolutions of various scales in neural networks,this dissertation first proposes a 6-dimensional data format��SDF.After that,the hardware designs multiple data resi-dency modes and weight data transmission modes to reduce the amount of on-chip data transmission.Finally,combined with software mode selection,data format conversion,and support and optimization of convolution parameters,it effectively solves the com-putational performance problems of convolutions of various scales.Experimental re-sults show that when convolution in classic neural networks is implemented,compared with NVIDIA RTX 8000 GPU and DaDianNao-like accelerator,the work described in this dissertation can achieve the highest performance improvement of 5.72 times and 7.35 times,respectively.

Keywords/Search Tags:

depthwise convolution, convolution, accelerator, software and hard-ware collaborative optimization, computing efficiency

PDF Full Text Request

Related items

1	The Design And FPGA Verification Of A CNN Accelerator With Depthwise Separable Convolutions
2	Design Of Energy-Efficient Acceleration For MobileNetV1
3	Design And Optimization Of Convolution Array Accelerator Based On FPGA
4	Accelerator Design And Research Of Depthwise Separable Convolutional Neural Network Based On FPGA
5	Research On Target Detection Based On Improved Convolutional Neural Network
6	A Study On Several Algorithms For Image Reconstruction
7	Research Of Computation Efficient Algorithm For Deep Learning
8	Real-Time Multi-Persons Pose Estimation In Complex Scenes
9	Research On CNN Network Acceleration For Image Classification Based On FPGA
10	Design Of Accelerator For MobileNet Convolutional Neural Network Based On FPGA