Font Size: a A A

Research On Convolutional Neural Network Accelerator Based On Multi-threaded Architecture

Posted on:2021-04-10Degree:MasterType:Thesis
Country:ChinaCandidate:W G ChenFull Text:PDF
GTID:2428330623465057Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,compared with traditional machine learning,convolutional neural networks with more hidden layers have more complex network structures and stronger feature learning and feature expression capabilities.Since the introduction of convolutional neural networks,it has achieved remarkable results in the fields of computer vision,speech recognition,and natural language processing.In order to enhance the accuracy of neural networks,deeper and deeper network structures have been designed,but with them,the amount of parameters and calculations has increased dramatically,which has caused general computing platforms such as CPUs and GPUs to face performance and energy efficiency challenges.In order to solve the problems of performance and energy efficiency,customized accelerator for convolutional neural networks have become a research hotspot.However,current convolutional neural network accelerators still have problems.At the hardware level,modern neural network accelerators mainly increase computing power by increasing the operating frequency and the number of computing units,which have already faced problems such as low utilization of computing units and poor scalability.At the software level,many neural network accelerators are still accelerating inefficient neural networks such as AlexNet and VGG16.Due to the large demand for computing and storage resources,these neural networks have been abandoned by more advanced computer vision applications.Aiming at the above-mentioned hardware problems,we propose a multi-threaded architecture that can be flexibly configured and dynamically expanded,and a new memory access mode is designed for the multi-threaded architecture.Aiming at software problems,we take MobileNet,a compact convolutional neural network,as one of the target networks.The main work of this article is as follows:1.We analyze the computing pattern of convolutional neural network and proposes a data flow reconfigurable computing module.The module can perform dynamic configuration of data flow,computing unit,and storage through instructions,and supports six calculation modes: standard convolutional layer,Active layer,pooling layer,depthwise convolutional layer,pointwise convolutional layer,and fully connected layer.2.We propose a multi-threaded architecture for convolutional neural networks,which can realize parallel calculation of sliding windows and output channels.This architecture abstracts the array of computing units into threads,each thread processing a sliding window.Parallel calculation of multiple output dimensions is implemented inside the thread,and parallel calculation of sliding windows is implemented between multiple threads.At the same time,the architecture shares feature maps within thread and share weights between threads,reducing the need for on-chip memory and memory access bandwidth.3.Based on the multi-threaded architecture,we propose a line fetch mode,which can reduce the number of accelerator fetches.It is found through experiments that when LeNet is used as the target network,a 1.6 times speedup can be obtained by using this memory access mode.At the same time,the architecture has good dynamic scalability.Compared with 4 threads,32 threads can achieve a speedup of 3.83 times.The reason why the linear acceleration ratio is not achieved is that the increase in the number of threads can only linearly accelerate the calculation process,and cannot linearly accelerate the memory access process.Compared with similar designs,the throughput and energy efficiency of this architecture are 1.28 times and 2.82 times,respectively.4.The accelerator supports MobileNet's unique depthwise convolutional layer and pointwise convolutional layer,and simultaneously performs the fusion of batch normalization layer and pointwise convolutional layer.It is found through experiments that when MobileNet is used as the target network,the architecture has good dynamic scalability.Compared with 4 threads,32 threads can achieve a speedup of 3.58 times.The reason for not achieving linear acceleration ratio is the same as above.Compared with similar designs,the throughput and energy efficiency of this architecture are 3.61 times and 1.16 times,respectively.
Keywords/Search Tags:FPGA, Multi-threaded architecture, Convolutional neural network accelerator, Compact convolutional neural network
PDF Full Text Request
Related items