Design And Verification Of GSD Processor Based On RNN

Posted on:2020-02-01

Degree:Master

Type:Thesis

Country:China

Candidate:J Peng

Full Text:PDF

GTID:2428330602451372

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of artificial intelligence and deep learning,the neural network has attracted more and more people's attention.The neural network has made great achievements in speech recognition and handwriting recognition.With the increasing scale of neural networks,the speed and performance requirements of processors are getting higher and higher,and traditional microprocessors can no longer meet the requirements.Therefore,it is very important to design a dedicated neural network processor.This paper summarizes the development of neural networks and processors at home and abroad and analyzes the advantages and disadvantages of various processors by reference to a large number of documents.This paper proposes a processor that supports RNN model.The processor can flexibly map various RNN algorithms in the form of an instruction set.The research work of this paper is as follows.This paper analyzes its algorithm for RNN and decomposes the algorithm into the form of instruction set.There are mainly control instructions,including data transmission instructions,vector calculation instructions,scalar calculation instructions,and other instructions.For the instruction set,this paper designs the overall framework of the processor.The overall framework of the processor can be divided into a unit of memory access,a unit of vector calculation,a unit of scalar calculation,and an of instruction part according to functions.The unit of memory access controls the storage and loading of data.The unit of vector calculation completes the matrix multiplication by the vector and the operation of the activation function,etc.The unit of scalar calculation mainly completes the calculation and control of the address.The instruction part completes the dispatch of the instruction and the null instruction.In the instruction dispatch module,in order to the possibility of parallel execution of the instruction is further explored by designing a single flag bit for the instruction,so that the processor can process the instruction to the maximum extent.This article detailed introduce the design of some of the key modules in the processor.In the memory access module,this paper designs a dedicated on-chip storage SPV and SPM,which can support vector memory access and matrix memory access of a large amount of data.In the decompression module,this design refers to some advanced model compression algorithms,and proposes a decompression algorithm for weight sharing and sparse matrix.Due to the existence of the decompression module,the processor can flexibly support the operation of uncompressed parameters and the operation of parameters after model compression.At the same time,this paper compares the quality of each algorithm to complete the hardware design of the activation function.In this paper,the design of the activation function uses a simple hardware method to achieve higher accuracy requirements.After completing the RTL code design of each module for the processor and the overall design of the processor,the processor is verified and evaluated.In order to better verify the function of the processor,the key functional modules of the processor are first verified.Then this article transplanted a DDR3 ip core based on AXI4 protocol by using vivado software.During the verification process,the ip core is connected to the external interface of the processor.This article uses the nc-verilog tool to verify the processor.Then,This paper using a 28nm process to synthesize the processor.The accelerator operates at 1.25 GHz and has a total area of 9.37 mm~2 and an overall power consumption of 9 W.The processor uses the Elman network to mapping algorithm.Processor accuracy and error were calculated and analyzed.The processor has an average error of 2.43%.The minimum error of the processor is 0.04%,which means that the processor can achieve a maximum precision of 99.96%when inferred.This paper analyzes the access bandwidth and throughput for different sizes of Elman networks.The processor designed in this paper has a good reference for the research of RNN processor.

Keywords/Search Tags:

Recurrent Neural Network, Processor, Access Unit, Decompression algorithm, Parallel Processing

PDF Full Text Request

Related items

1	Research On Emotional Tendency Classification Based On Online Video Website Reviews
2	Parallel Optimization Technology Of Satellite Image Decompression Based On Multi-core Processor
3	Improved Recurrent Neural Network Method And Its Application
4	Research And Implementation Of Speech Recognition Algorithm Based On Recurrent Neural Network
5	Research On Image Description Method Based On Multimodal Recurrent Neural Networks
6	Comparable Corpus Acquisition Of Cambodian-Chinese Parallel Sentence Pairs Based On Bidirectional Recurrent Neural Network
7	Algorithm Design And Optimization Of Recurrent Neural Network Training On GPU Platform
8	Research On The Related Technologies Of Network Processor And Its Processing Unit
9	Research On The Key Technology Of Graphic Processor And The Parallel Structure Of Ray Tracing
10	The Design And Implementation Of Protocol Processor Unit Control Plane Based On Network Processor