Font Size: a A A

An FPGA Implementation Method Of Efficient Deep Convolution Neural Network Based On HLS

Posted on:2020-07-25Degree:MasterType:Thesis
Country:ChinaCandidate:D S ZhaoFull Text:PDF
GTID:2428330602952514Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
DCNN(Deep Convolution Neural Network)is a key technology in deep learning.It has been widely used in intelligent processing fields such as target detection,image classification and speech recognition.With the continuous development of demand,it is necessary to design a real-time processing system with small size,low power consumption,fast speed and high precision.However,due to the deep layer number and many parameters,the DCNN network has a large amount of calculation and data,which makes it difficult to meet the application requirements.It is necessary to study and design a network structure with lower complexity and its efficient parallel architecture.Compared with CPU(Central Processing Unit)which runs streams of sequential instruction,FPGA(Field Programmable Gate Arrays)can achieve concurrent operation,and has higher Performance per watt than GPU(Graphics Processing Unit).It is the mainstream platform to solve DCNN real-time processing.However,traditional RTL(Register Transfer Level)-based FPGA application development methods have low flexibility,portability,and scalability,which greatly affects development efficiency.HLS(High Level Synthesis)can automatically convert high-level abstract languages such as C,C++ and System C into HDL(Hardware Description Language)VHDL/Verilog,which provides new ideas and tools for efficient FPGA design.Through the analysis,summary and improvement of the existing related research,this paper gives a series of methods on how to build and train a small,efficient and hardware-accelerated network at the software level,how to reduce resources,lower power consumption and improve speed in the implementation of FPGA,and how to increase the flexibility,portability and scalability of a design in HLS,which have very good practical value.Combined with these methods,a network named EfficientNet is built and trained,and the inference of it is accelerated on the FPGA by HLS.By comparing with other networks and platforms,the effectiveness of these methods is verified.The main work and contributions of this paper are as follows: 1)A lightweight deep learning network EfficientNet is designed and implemented.Aiming at the problem that traditional network parameters and computation are too large to be realized by hardware,on the premise of guaranteeing the accuracy,this paper analyzes the methods of using deep separable convolution to take the place of standard convolution,replacing the pooling with stepping,exchanging the fully connection for average pool and proposes the method of channels increase and decrease alternate without increasing size,and builds a low complexity DCNN named EfficientNet by integrating these methods.The experimental results show that the classification accuracy of EfficientNet on the public Flower_photos dataset is 89.3%.Compared with Inception-v3,the accuracy loss of EfficientNet is only 5.7% when the number of parameters and calculation is reduced to one fifty-sixtieth.2)The inference acceleration system of EfficientNet is designed and implemented on the FPGA.Focusing on reducing resources,lowering power consumption and improving speed,this paper introduces the patch mechanism,selects and designs the data reuse mode suitable for this paper,adopts streaming convolutional circuit,and proposes a deep pipeline parallel architecture.Combined with these methods,the inference of EfficientNet designed in this paper is accelerated on the FPGA(ZCU102 development board@244Mhz).Experiments show that the EfficientNet FPGA inference acceleration system only occupies less than half of the resources on ZCU102 development board,and the processing speed can reach 512?512 @36fps,which meets the requirements of real-time processing.It is about 66 times faster than CPU(E5645@2.40GHz*2 six-core),nearly twice faster than GPU(Tesla K80),and the power consumption is nearly two times lower than GPU.3)The DCNN function template library is compiled and a design space exploration model is proposed.In order to solve the problem of low flexibility,reusability and scalability of traditional FPGA design,this paper makes full use of the advantages of HLS based on C development,uses the template function of C++ to compile some functions needed by DCNN,parameterizes the configuration information,standardizes the input and output interfaces,and proposes a design space exploration model.It makes subsequent researchers to modify and expand the HLS library easily,and don't need to be limited to the EfficientNet architecture designed in this paper,could to use the evaluation method given in this paper to call the function library written in this paper and set the parameters according to their own needs,so as to realize the network suitable for themselves.
Keywords/Search Tags:DCNN, FPGA, HLS, High Speed, Low Power, Small Size, High Accuracy
PDF Full Text Request
Related items