Font Size: a A A

The Hardware Acceleration Technology Research Of Deep Learning Algorithm Based On The Multicore DSP

Posted on:2017-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:K JiFull Text:PDF
GTID:2428330569998712Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
DSP is a low-power specific processor and a new hardware platform which is different from the general processor.Based on the DSP platform,the deep learning algorithm optimization technology will greatly promote the development of miniaturized,low-power and integrated intelligent devices.This paper studies the adaptive architecture and parallel optimization technology of deep learning algorithm,for achieving hardware acceleration on high-performance multicore DSP platform.Firstly,the architecture of high-performance parallel processing system based on multicore DSP is studied,and a low-power DSP structure with three-level parallel architecture is proposed.A parallel mechanism based on DMA is designed and the corresponding system function is implemented.After that,for the deep belief network,this paper studies the parallel algorithm technology under the multi-level parallel DSP structure,proposes the DBN parallel algorithm based on the large matrix operation and multicore parallel mechanism,and gives the concrete realization.The experimental results show that the throughput of DBN pretraining process on DSP is 989.22 images per second,and the power efficiency is 6 times of that of the mainstream general microprocessors.Then,for the convolution neural network,two implementations of computing convolutions on DSP,conversed to FFT and matrix multiplication,are discussed,and a multicore DSP parallel accelerating algorithm based on convolutional matrix expansion is proposed.The experimental results show that the performance of the way of coarse grain parallelism and channel by channel to compute the convolutional layer is highest.At last,based on the typical deep learning programming framework Caffe,this paper studies the deep learning programming framework DSP for Caffe,and carries out the verification based on,Cifar-10,AlexNet and VGG-s,three convolutional neural networks in the field of image recognition.The experimental results show that the throughputs of them are 404.86,6.35 and 2.37 images per second respectively,and the power efficiency is 4.77,2.60 and 1.97 times of that of the mainstream general microprocessors.The framework can support the automatic mapping of the main Caffe expression to the DSP implementation.
Keywords/Search Tags:DSP, Deep Learning, Hardware Acceleration, Power Efficiency, Programming Framework
PDF Full Text Request
Related items