| In recent years,artificial intelligence has achieved great success in both theory and application.As the most important research direction of artificial intelligence,deep learning can solve more abstract and complex problems.However,as the problem is more abstract and complex,the scale of deep learning networks is also increasing,and the learning time of models is also increasing.Therefore,the research on the acceleration technology of deep learning algorithm has become the trend of the times.Compared to Central Processing Unit(CPU),Graphics Processing Unit(GPU),and Application Specific Integrated Circuit(ASIC),Field Programmable Gate Array(FPGA)has the following advantages in deep learning algorithm acceleration:high-speed,low-power,stable and low-delay,suitable for streaming computationally intensive tasks and communication-intensive tasks,flexible and short development cycle,low cost,easy to carry,etc.At present,there is not much research on the specific architecture of the FPGA implementation of the deep learning algorithm,and there are few researches on the FPGA acceleration design of the training process.Convolutional Neural Networks(CNN)algorithm is one of the most important deep learning algorithms.It has made breakthroughs in applications such as speech and image recognition.Based on CNN,thesis mainly studies and implements deep learning for FPGA acceleration from four aspects: the basic principles of algorithms,optimization models and simulation modeling,general hardware architecture design and FPGA implementation.Firstly,The deep learning theory such as Deep Neural Network(DNN)and CNN algorithm,and studies optimization methods such as model basic parameter selection,regularization and abstention technology are introduced.Then,A specific Lenet CNN model,which has good performance under a all-scale model,is proposed.The accuracy rate of Lenet CNN model is up to 96.64%.Secondly,the thesis studies the general hardware architecture of the CNN forward prediction process and the backward training process.The thesis mainly proposes serialto-matrix conversion structure which is based on Shift Register and Hardware architecture of the convolutional layer and the pooled layer main arithmetic unit which is based on Systolic Array(SA).This architecture that is modular and scalable can build any size CNN model with increases frequency and computational throughput and reduces I/O bandwidth requirements.At the same time,considering the calculation time and resource consumption,the Softmax layer hardware design framework of piecewise fitting approximation is proposed.Finally,based on the hardware implementation architecture,FPGA implementation and analysis verification of system performance for the prediction and training process of Lenet CNN are completed.First,the Matlab fixed-point simulation verification for the prediction and training process are completed.Then,the function simulation verifications in Modelsim are completed after building the system module.Then the FPGA implementation is completed on XC7K325T-2FFG900 and XC7VX690T-2FFG1157.Finally,the performance of the FPGA implementation system with CPU and GPU in terms of speed and power consumption are analyzed.FPGA has about 3 times higher than CPU in speed.CPU and GPU are more than 100 times of FPGA in power. |