Font Size: a A A

Research On Key Techniques Of Deep Convolutional Neural Network Accelerators Based On FPGA Bus Framework

Posted on:2017-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y M ZhouFull Text:PDF
GTID:2428330569998652Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With a boom in intelligent applications based on deep convolutional neural network(DCNN),the performance of the inference process is becoming a hotspot.Since a large number of operations in DCNN can be executed in parallel,there is chance for hardware technology to accelerate the algorithm.Compared to the software implementation in high precision,hardware implementation of neural network using finite bit-width and approximating methods can be massively parallelly executed while saving resources and cutting bandwidth requirements,thus performance gain is obtained.At the same time,there are great challenges in hardware implementation for the deep learning applications,such as the limited precision,the long development cycle,scalability and so on.To deal with the low precision problems,this paper proposed an evaluation method for fixed-point arithmetics in DCNN,and performed a detailed analysis of the low precision effect on AlexNet,and developed a simulation software for DCNN model.The simulation results show that for typical DCNN applications,the accuracy degradation caused by low precision arithmetics is very limited.By using reasonable bit-widths and approximating methods,we can guarantee the accuracy while substantially reduce data and precision,so as to provide guidance for large-scale parallel hardware implementation in fixed-point precision.To shorten the development cycle,this paper proposed a design and implementation method for DCNNs based on High Level Synthesis(HLS)tool,and designed a hardware accelerator in fixed-point for LeNet algorithm based on HLS tool.The experimental results show that the hardware accelerator achieved good speedup compared to the Matlab version;at the same time,there is little difference in performance and resource cost between our method and the traditional method based on Verilog.To cope with the scalability and maintainability problems,this paper proposed an accelerator framework for deep learning algorithm based on FPGA standard internal bus AXI4(Advanced eXtensible Interface 4).The framework supports the design of accelerator of DCNNs with diverse layers,by providing a multi-port memory controller with a configurable number of ports,and supports third party modules based on AXI4 protocol.The AlexNet accelerator architecture based on AXI4 and pipeline technology is proposed,and the design and implementation of the system are given.The experimental results show that the accelerator with improved scalability,maintainability and testability is 1 times better in performance compared to the non-framework customed accelerator while uses slightly more resource.
Keywords/Search Tags:Deep Convolutional Neural Network, Fixed-point Arithmetic, High Level Synthesis, Advanced eXtensible Interface 4.0
PDF Full Text Request
Related items