Font Size: a A A

Design And Research Of Deep Learning Heterogeneous Computing System Based On FPGA

Posted on:2021-09-10Degree:MasterType:Thesis
Country:ChinaCandidate:H N QiangFull Text:PDF
GTID:2518306122967009Subject:IC Engineering
Abstract/Summary:PDF Full Text Request
In recent years,deep learning has been widely used in various fields,especially in computer vision and speech recognition,and has promoted the progress of artificial intelligence technology.Huge computing power is the foundation to support the development of deep learning.How to meet the increasing demand for computing power has become a hot research direction.Heterogeneous computing has become a mainstream solution to improve computing power with its excellent performance and flexible structure.However,for the complex neural network,to carry out a large amount of data throughput operations,from algorithm to hardware structure need a lot of optimization.The high-performance and low-power FPGA platform solves the problem of requiring highly parallel hardware structures.Based on Open CL and ZYNQ heterogeneous computing frameworks,this paper compares and analyzes the characteristics of on-chip and inter-chip heterogeneous,and proposes an optimization and implementation method of heterogeneous computing based on FPGA.In this method,it is necessary to divide the computing task and assign the same operation code segment to the same subtask.According to the requirements of subtask,a reusable and configurable acceleration kernel is developed in FPGA,which makes it have high reuse rate and flexibility.The highly parallel kernel design exploits the computing power of the hardware and increases the systems throug hput through the pipeline design between the cores.In addition,by optimizing memory access and interface communication,the performance and energy consumption of the kernel are improved.Compared with the traditional heterogeneous computing platform,it has a flexible structure and excellent energy consumption ratio,which can be used in small embedded devices without affecting the classification accuracy.In this paper,we verify the implementation method on DE5-Net FPGA Development Kit and Zynq XC7z035ffg676 development boards.Based on the theoretical model of hardware resource,bandwidth and power consumption,the peak floating-point operation speed of neural network on the device is obtained.The experimental results of convolution neural network based on this method show that the image recognition only needs 1.1ms and the power is 2.5W.Compared with the operation efficiency of arm with the same power consumption,the convolution operation speed based on heterogeneous platform is 46.8 times faster.
Keywords/Search Tags:heterogeneous computing, hardware acceleration, low power consumption, neural network, kernel reuse
PDF Full Text Request
Related items