Research On Neural Network Accelerator Based On Embedded Platform

Posted on:2020-10-05

Degree:Master

Type:Thesis

Country:China

Candidate:J Y Si

Full Text:PDF

GTID:2428330620458899

Subject:Integrated circuit engineering

Abstract/Summary:

PDF Full Text Request

Artificial Intelligence(AI)surpasses the performance and accuracy of traditional algorithms in many applications,and more and more enterprises and data centers attach great importance to AI algorithm and start to deploy it.AI applications are also beginning to reach to terminals,and how to efficiently deploy AI applications to embedded devices with high performance is one of the current hot issues.This thesis concentrates on the neural network acceleration circuit design on embedded devices.Since operation of the convolutional layer dominate the run time,a multiplication result reusable design method is proposed.The algorithm can dynamically adapt to different input feature map region,and effectively reduce the multiplication operation of the convolution layer without loss of accuracy.In order to further improve the performance of the neural network accelerator,we combine the multiplication result reuse algorithm with Row Stationary method to complete the architecture design of the convolutional layer.Based on that,we accomplish the design of the whole neural network accelerator.The neural network accelerators are tested on the CIFAR-10 and MNIST datasets,and results show that the multiplication can be reduced maximum by 50% and minimum by 16% in different convolutional layers without any loss of accuracy.At the expense of acceptable loss of accuracy,more multiplications can be reduced.Our design utilizes Xilinx's Zynq-7000 series ZC702 development board as embedded system research platform.Using high-level language synthesis for implementing and optimizing design,the hardware platform of the entire accelerator is built on PL,and then the PYNQ operating system is installed on PS and hardware driver is completed after that.This design uses DMA to transfer data between the accelerator on PL and PS efficiently to meet bandwidth requirements.Eventually,the accelerator can achieve an actual throughput of 3.92 GOP/s and an energy efficiency of 294 frames/J and achieves 1.2x improvement in throughput and energy efficiency over reference design.

Keywords/Search Tags:

Multiplication result reuse, Embedded system, Artificial Intelligence, Convolutional neural network, Hardware accelerator

PDF Full Text Request

Related items

1	A Convolutional Neural Network Accelerating Circuit Design And FPGA Implementation
2	Hardware Accelerator Design Of Convolutional Neural Networks For Low Power And High Performance
3	Design And Implementation Of A High-performance Accelerator Dedicated For Convolutional Neural Networks
4	Design And Research Of Convolutional Neural Network Accelerator Based On PYNQ Embedded Platform
5	Design Of General-purpose Convolutional Neural Network Accelerator Based On FPGA
6	Implementation And Application Of Hardware Accelerator Based On Image Recognition Technology
7	Based On FPGA To Design And Implement The Algorithm Of VGG-16 Neural Network
8	Design And Implementation Of Convolutional Neural Network Accelerator Based On ZYNQ
9	Design Of Hardware Accelerator Based On FPGA For Convolutional Neural Networks
10	Design And Optimization Of Tiny YOLO Convolutional Neural Network Accelerator