Research On Key Techniques Of Deep Convolutional Neural Network Accelerators Based On FPGA Bus Framework

Posted on:2017-05-29

Degree:Master

Type:Thesis

Country:China

Candidate:Y M Zhou

Full Text:PDF

GTID:2428330569998652

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With a boom in intelligent applications based on deep convolutional neural network(DCNN),the performance of the inference process is becoming a hotspot.Since a large number of operations in DCNN can be executed in parallel,there is chance for hardware technology to accelerate the algorithm.Compared to the software implementation in high precision,hardware implementation of neural network using finite bit-width and approximating methods can be massively parallelly executed while saving resources and cutting bandwidth requirements,thus performance gain is obtained.At the same time,there are great challenges in hardware implementation for the deep learning applications,such as the limited precision,the long development cycle,scalability and so on.To deal with the low precision problems,this paper proposed an evaluation method for fixed-point arithmetics in DCNN,and performed a detailed analysis of the low precision effect on AlexNet,and developed a simulation software for DCNN model.The simulation results show that for typical DCNN applications,the accuracy degradation caused by low precision arithmetics is very limited.By using reasonable bit-widths and approximating methods,we can guarantee the accuracy while substantially reduce data and precision,so as to provide guidance for large-scale parallel hardware implementation in fixed-point precision.To shorten the development cycle,this paper proposed a design and implementation method for DCNNs based on High Level Synthesis(HLS)tool,and designed a hardware accelerator in fixed-point for LeNet algorithm based on HLS tool.The experimental results show that the hardware accelerator achieved good speedup compared to the Matlab version;at the same time,there is little difference in performance and resource cost between our method and the traditional method based on Verilog.To cope with the scalability and maintainability problems,this paper proposed an accelerator framework for deep learning algorithm based on FPGA standard internal bus AXI4(Advanced eXtensible Interface 4).The framework supports the design of accelerator of DCNNs with diverse layers,by providing a multi-port memory controller with a configurable number of ports,and supports third party modules based on AXI4 protocol.The AlexNet accelerator architecture based on AXI4 and pipeline technology is proposed,and the design and implementation of the system are given.The experimental results show that the accelerator with improved scalability,maintainability and testability is 1 times better in performance compared to the non-framework customed accelerator while uses slightly more resource.

Keywords/Search Tags:

Deep Convolutional Neural Network, Fixed-point Arithmetic, High Level Synthesis, Advanced eXtensible Interface 4.0

PDF Full Text Request

Related items

1	Research On Parallel Computing Architecture Of Siamese Network Algorithm
2	Application Research And Design Of Deep Neural Network Based On FPGA
3	Research On Algorithm Of Convolutional Neural Network Suitable For Engineering Implementation
4	Research On Key Problems Of Fixed-point For Convolutional Neural Network
5	Research On Neural Network Based Statistical Parametric Speech Synthesis
6	Design And Research Of FPGA-based Deep Learning Accelerator
7	Using Of Symbolic Polynomials And Interface Synthesis On High Level Synthesis
8	Research On Algorithms Of Implementing Convolutional Neural Networks By Hardware
9	Research On Convolutional Neural Network Acceleration Framework For Cloud-based FPGAs
10	Server Monitoring System Based On Convolutional Neural Network