Font Size: a A A

A Convolutional Neural Network Accelerator Based On FPGA

Posted on:2020-12-12Degree:MasterType:Thesis
Country:ChinaCandidate:Q JianFull Text:PDF
GTID:2428330572967294Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
At present,the performance of general-purpose processors in neural network processing is not ideal.Fortunately,the field programmable gate array s(FPGA)has the characteristics of reconfigurability,low power consumption,which is easy to adapt to the calculation process of the neural network.Although the neural network reduces the computational complexity through sparse connections,weight sharing,subsampling,it is still computational intensive and resource intensive,making it difficult to deploy applications on devices with limited resources.In view of the above research background and problems,this paper mainly does some research work:1.Large-scale neural networks have high resource requirements and cannot be deployed on FPGA devices entirely.This paper analyzes the computing process of neural networks,decomposes them into basic computing units,and embeds a subsampling controller in the convolution controller by multiplexing,so that the computing module can obtain more resources to increase the calculation density.2.The computing core in most accelerator design is sensitive to channels so that the performance is not ideal when channel changes.Therefore,a mirror-tree structure is designed to separates computational logic related to channels from the hardware structure which can ensure computational efficiency.3.In the back propagation process of convolutional neural networks,there are a large number of zero-value elements in the feature map,resulting in a decrease in resource utilization and computational efficiency.Therefore,a suitable compression strategy is applied to eliminate the zero-value elements in the feature map to compress the storage space and boost up the calculation.The experimental results show that the computing performance of this implementation reaches 22.74 GOPS on 32-bit fixed/float point.Compared with MAPLE accelerator,the computational density is increased by 283.3%,and the calculation speed is boosted by 224.9%.Compared with MCA(Memory-Centric Accelerator),the computational density is increased by 14.47%,and the calculation speed is boosted by 33.76%.With a precision range between 8-bit and 16-bit fixed point,the performance reaches 58.3GOPS,and the computational density is increased by 8.5%compared with LB A(Layer-Based Accelerator).
Keywords/Search Tags:Neural Network, Parallel Compute, Mirror-tree Structure, Hardware Accelerate, FPGA
PDF Full Text Request
Related items