FPGA Based Convolutional Neural Network Accelerator Design And Realization

Posted on:2018-11-24

Degree:Master

Type:Thesis

Country:China

Candidate:S Y Wang

Full Text:PDF

GTID:2348330512987995

Subject:Communication and Information System

Abstract/Summary:

With the development of computer performance and the futher research in marchine learning,convolutional neural network(CNN)becomes one of the most popular algorithm in artificial intellegnece.However,because of its complex structure and training method,CNN need a large amount of computational resource.Conventional CNN realization is based on generalized CPU to perform training,which are comparatively slow and hard for real-time applications.Field-programmable gate arrays(FPGA)technology have high flexibility and easy to perform parallel computing.It shows strong ability for many algorithm in artifical intelligence,including CNN.This thesis proposes a novel architecture for accerlerating the computation speed in CNN based on FPGA technology.ETL9 B Japanese Handwriting Database is used for verificating its effectiveness.The model got 99.7% accuracy and around 90% of processing time is reduced.The first chapter is the introduction of CNN.It mainly describes the basic conception of CNN,including the history and the development of CNN algorithm.For the understanding of the proposed design,the analysis of FPGA based CNN structures are provided in this chapter.The applications of CNN are also introduced.In the second chapter,the mathematical background of CNN is introduced,including picture convolution,pooling,activation function and back propagation algorithm for training Neural Network.Besides,the development in CNN structure are also provided.The third chapter describes the principle of computing activation function by Coordinate Rotation Digital Computer(CORDIC)algorithm.In order to accerlerate the calculation speed,this thesis modify the conventional CORDIC algorithm.A new rotation straytegy called Unified Rotation Strategy(URS)is proposed,which are the combination of Look-up Table Method and Greedy Algorithm.The URS-CORDIC can fast finish the iteration procedure in conventional CORDIC.Chapter four proposes the hardware artitecture for CNN and describes the detail design,including the CORDIC processor,convolution computing block,Max-pooling unit and the controlling block.The usage of on-chip memory is also analysed to avoid redundant memory accessing.At last,chapter five provides the simulation and realization results of proposed CNN hardware accerlarator,including the CORDIC processor and the convolution computing block.The system is tested by ETL9 B Japanese Handwriting Database.Compared with the realization in software,the training time is drastically reduced.It shows the great advantage of FPGA for training procedure in CNN.

Keywords/Search Tags:

CNN, Hardware Accerleration, FPGA, CORDIC, URS-CORDIC, Parallel Computing

Related items

1	The Research Of CORDIC Algorithm Improvement And Its Hardware Implementation
2	The Research Of Efficient Wide Convergence Scaling-free CORDIC Algorithm And Architecture
3	Adaptive CORDIC: Using parallel angle recoding to accelerate CORDIC rotations
4	Implementation Of Linear Frequency Modulation Signal Based On Cordic Algorithm By FPGA
5	The Optimization And Its Fpga Implementation Of The Cordic Algorithm For Sine Cosine Calculation
6	Design Of Direct Digital Frequency Synthesizer Based On CORDIC Algorithm And Implementation
7	Cordic Algorithm-based High-performance Fft Design And Implementation
8	Design Of High Speed DDS Based On Improved CORDIC Algorithm
9	Research On DDS Based On CORDIC Algorithm And Implementation With FPGA
10	Design And Implementation Of Low-Power Hardware Accelerators Based On CORDIC