Font Size: a A A

Research On Neural Network Accelerator Based On Embedded Platform

Posted on:2020-10-05Degree:MasterType:Thesis
Country:ChinaCandidate:J Y SiFull Text:PDF
GTID:2428330620458899Subject:Integrated circuit engineering
Abstract/Summary:PDF Full Text Request
Artificial Intelligence(AI)surpasses the performance and accuracy of traditional algorithms in many applications,and more and more enterprises and data centers attach great importance to AI algorithm and start to deploy it.AI applications are also beginning to reach to terminals,and how to efficiently deploy AI applications to embedded devices with high performance is one of the current hot issues.This thesis concentrates on the neural network acceleration circuit design on embedded devices.Since operation of the convolutional layer dominate the run time,a multiplication result reusable design method is proposed.The algorithm can dynamically adapt to different input feature map region,and effectively reduce the multiplication operation of the convolution layer without loss of accuracy.In order to further improve the performance of the neural network accelerator,we combine the multiplication result reuse algorithm with Row Stationary method to complete the architecture design of the convolutional layer.Based on that,we accomplish the design of the whole neural network accelerator.The neural network accelerators are tested on the CIFAR-10 and MNIST datasets,and results show that the multiplication can be reduced maximum by 50% and minimum by 16% in different convolutional layers without any loss of accuracy.At the expense of acceptable loss of accuracy,more multiplications can be reduced.Our design utilizes Xilinx's Zynq-7000 series ZC702 development board as embedded system research platform.Using high-level language synthesis for implementing and optimizing design,the hardware platform of the entire accelerator is built on PL,and then the PYNQ operating system is installed on PS and hardware driver is completed after that.This design uses DMA to transfer data between the accelerator on PL and PS efficiently to meet bandwidth requirements.Eventually,the accelerator can achieve an actual throughput of 3.92 GOP/s and an energy efficiency of 294 frames/J and achieves 1.2x improvement in throughput and energy efficiency over reference design.
Keywords/Search Tags:Multiplication result reuse, Embedded system, Artificial Intelligence, Convolutional neural network, Hardware accelerator
PDF Full Text Request
Related items