Optimization And Acceleration Of Convolutional Neural Network

Posted on:2020-07-14

Degree:Master

Type:Thesis

Country:China

Candidate:J C Wang

Full Text:PDF

GTID:2518305732998859

Subject:Microelectronics and Solid State Electronics

Abstract/Summary:

PDF Full Text Request

Convolutional Neural Network(CNN)has become one of the most popular deep learning algorithms due to its remarkable performance in image,voice and text fields.Real-time CNN implementations in resource limited embedded systems are becoming highly desired,despite its inherent high computational complexity.It has great prospects that CNN based applications like image generation and voice recognition are realized in mobile devices.It is the trend of CNN implementations as well.All that happen attribute to high accuracy and low power CNN implementation techniques.In the early development of deep learning,both efficient training and fast inference are realized on specific graphic processing unit(GPU).However,the computational complexity cannot be reduced on GPU.With the rapid development of CNN,various CNN models have been emerging.For example,from the early base nets like AlexNet[1],VGGNet[2],ResNet[3],to high accuracy nets like DenseNet[4],FractalNet[5],to light weight nets like MobileNet(V1,V2)[6,7],SqueezeNet[8],to famous generative adversarial networks(GAN)[9]nowadays.Although the model structures of these nets are quite different,the basic operations inside are similar.Convolutions occupy more than%90 computations in CNN implementation,need tremendous on-chip storage resources and consume large amounts of power.Based on these factors,we focus on the reduction of CNN complexity,the reduction of on-chip storage resources and the optimization of external bandwidth.In the up-to-date GAN model,deconvolution,which is the inverse operation of convolutions,are included.If deconvolutions are implemented by traditional convolution approaches,it will cause the problems of redundant computations and memory overhead.Deconvolution is also what we want to optimize.Based on parallel fast finite impulse response(FIR)algorithm(FFA),the efficient implementation of traditional convolution is designed.Based on efficient deconvolution transform method,we transform deconvolution to convolution equivalently.Thus the acceleration of deconvolution can be performed by regular convolution accelerator.Based on layer fusion and resource partition scheme,the required on-chip resources are considerably reduced and the external memory bandwidth is efficiently utilized.The problem of bandwidth imbalance is solved.In this paper,we first derive 3 and 5 parallel FFA theoretically.Then,based on these algorithms,we design 3 and 5 parallel fast convolution units(FCU),respectively.The multiplications of 3 � 3 and 5 � 5 convolutions can be saved by 30%and 40%respectively.The reconfigurable FCU is designed in order to reduce computational resources.We implement our designs on Xilinx FPGA platform.We outperform similar works by 2x in terms of resource utilization.The proposed efficient storage architecture save 14x memory resources compared to traditional approaches and can store all intermediate results on chip.The demo design achieves 33fps of image classification of 224x224,which is 3x of similar works.The proposed bandwidth efficient architecture applies resource partition and computation pipeline,which increases system output incredibly and reduces bandwidth by 2x compared to similar works.

Keywords/Search Tags:

Convolutional Neural Network(CNN), Convolution and deconvolution, optimization and acceleration, bandwidth and storage optimization, FPGA

PDF Full Text Request

Related items

1	Research On CNN Network Acceleration For Image Classification Based On FPGA
2	Research On Acceleration And Storage Optimization Of Convolutional Neural Network
3	Research On The Acceleration And Optimization Method Of Convolutional Neural Network
4	Design And Implementation Of Convolutional Neural Network Acceleration Based On FPGA
5	Research On Hardware Acceleration Technology Of Convolutional Neural Network And Implementation On FPGA
6	Deep Convolution Algorithm Optimization And Hardware Acceleration
7	A Convolutional Neural Network Accelerator Based On FPGA
8	Study On Method And Implementation Of FPGA Based Acceleration For Convolution Neural Network
9	Research On FPGA-based RTL-level Convolutional Neural Network Computing System
10	Design And Optimization Of Convolution Array Accelerator Based On FPGA