Optimization And Implementation For FPGA-based Deep Learning Accelerator

Posted on:2019-01-30

Degree:Master

Type:Thesis

Country:China

Candidate:Y Huang

Full Text:PDF

GTID:2428330611993617

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

As one of the popular algorithms for deep learning,convolutional neural networks(CNNs)have achieved great success in many computer vision applications,such as image recognition,speech recognition.Recently,the computational complexity of CNN has significantly increased with the continuous improvement of its recognition accuracy.Since CPUs can hardly provide the massive parallelism required by CNNs,many hardware accelerators,such as GPU,FPGA,and ASIC,have been developed to improve the performance of CNNs.Of these designs,FPGA-based accelerators have become a particularly option due to their high energy efficiency and reconfigurability.Previous FPGA implementations of CNNs are mainly based on the conventional convolutional algorithm.However,the high arithmetic complexity of conventional convolution algorithm for CNNs restricts the performance of accelerators and significantly increases the challenges of design.It has been proved that the Winograd algorithm for CNNs can effectively reduce the computational complexity.Although a few FPGA approaches based on the Winograd algorithm have been implemented,their works are lake of evaluation on the performance for different tile sizes of the Winograd algorithm.In this work,we focus on exploring the possibility of using the Winograd algorithm to accelerate CNNs on FPGA.First,we propose an accelerator architecture applying to both convolutional layers and fully connected layers.Second,we use high level synthesis tool to expediently implement our design.Finally,we evaluate our accelerator with different tile sizes in terms of resource utilization,performance and efficiency.On VUS440 platform,we achieve an average 943 GOPS for overall VGG16 under low resource utilization.As the scale of convolutional neural networks continues to increase,single FPGA will restrict the performance of the accelerator due to its limited computation and storage resources.Moreover,different layers of CNN require different computation resources and bandwidth,which is difficult to increase the efficiency of a single-chip FPGA to accelerate the entire CNN.In this work,we propose a parallel acceleration scheme for CNN based on multi-FPGA and explore the optimal solution for mapping CNN to multi-FPGA.Finally,we built a multi-FPGA system,and evaluate the multi-FPGA system on VGG16.Compared with CPU and GPU,we achieve better performance in the terms of latency and energy efficiency.

Keywords/Search Tags:

convolutional neural networks, FPGA, Winograd algorithm, multi-FPGA

PDF Full Text Request

Related items

1	Research On Acceleration Of Convolutional Neural Networks On FPGA Based On OpenCL
2	Research On Hardware Parallel Acceleration For Novel Convolutional Neural Networks
3	Design And Optimization Of Convolution Array Accelerator Based On FPGA
4	Research On Binarization Of Convolutional Neural Network And FPGA Implementation
5	A Convolutional Neural Network Accelerating Circuit Design And FPGA Implementation
6	Implementation And Research Of FPGA-based Convolutional Neural Network Accelerator
7	Research On Acceleration Of Low-Precision Convolutional Neural Networks On FPGA
8	Quantitative Research On Convolutional Neural Networks And FPGA Implementation
9	Zynq-based Convolutional Neural Network Embedded Acceleration System Design
10	Accelerated Design Of Composite Convolutional Neural Network Algorithm Based On FPGA