Font Size: a A A

Optimization And Implementation For FPGA-based Deep Learning Accelerator

Posted on:2019-01-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y HuangFull Text:PDF
GTID:2428330611993617Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As one of the popular algorithms for deep learning,convolutional neural networks(CNNs)have achieved great success in many computer vision applications,such as image recognition,speech recognition.Recently,the computational complexity of CNN has significantly increased with the continuous improvement of its recognition accuracy.Since CPUs can hardly provide the massive parallelism required by CNNs,many hardware accelerators,such as GPU,FPGA,and ASIC,have been developed to improve the performance of CNNs.Of these designs,FPGA-based accelerators have become a particularly option due to their high energy efficiency and reconfigurability.Previous FPGA implementations of CNNs are mainly based on the conventional convolutional algorithm.However,the high arithmetic complexity of conventional convolution algorithm for CNNs restricts the performance of accelerators and significantly increases the challenges of design.It has been proved that the Winograd algorithm for CNNs can effectively reduce the computational complexity.Although a few FPGA approaches based on the Winograd algorithm have been implemented,their works are lake of evaluation on the performance for different tile sizes of the Winograd algorithm.In this work,we focus on exploring the possibility of using the Winograd algorithm to accelerate CNNs on FPGA.First,we propose an accelerator architecture applying to both convolutional layers and fully connected layers.Second,we use high level synthesis tool to expediently implement our design.Finally,we evaluate our accelerator with different tile sizes in terms of resource utilization,performance and efficiency.On VUS440 platform,we achieve an average 943 GOPS for overall VGG16 under low resource utilization.As the scale of convolutional neural networks continues to increase,single FPGA will restrict the performance of the accelerator due to its limited computation and storage resources.Moreover,different layers of CNN require different computation resources and bandwidth,which is difficult to increase the efficiency of a single-chip FPGA to accelerate the entire CNN.In this work,we propose a parallel acceleration scheme for CNN based on multi-FPGA and explore the optimal solution for mapping CNN to multi-FPGA.Finally,we built a multi-FPGA system,and evaluate the multi-FPGA system on VGG16.Compared with CPU and GPU,we achieve better performance in the terms of latency and energy efficiency.
Keywords/Search Tags:convolutional neural networks, FPGA, Winograd algorithm, multi-FPGA
PDF Full Text Request
Related items