Font Size: a A A

The Research And Implementation Of Deep Learning Heterogeneous Computing Platform Based On CPU And Multiple FPGA Architecture

Posted on:2020-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:S J ZhouFull Text:PDF
GTID:2428330572472131Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
Deep learning has achieved great success in a wide range of applications,which grants different demands of computational resources,thus requires a high flexibility hardware solution of deep learning systems.This dissertation proposes a heterogeneous computing platform based on a CPU and a flexible number of FPGAs to enable the implementation of deep learning algorithms in different scales.This dissertation not only gives an overview of the platform design,but also addresses key techniques used in the proposed platform together with the application in a scenario of rare sound detection.Analyzing the data stream and work flow in the proposed platform,the components in the proposed platform are introduced.There are three sub-systems in the proposed platform.CPU sub-system is used to control data stream transmission,FPGA sub-system is used to implement the core calculation of deep learning algorithm with a general hardware architecture,and these two sub-systems are connected by the PCIe bus communication component.The proposed platform provides higher but more flexible computational power by adding different number of FPGAs.As the core computing unit of the platform,the design of FPGA sub-system is addressed in further details due to the computing speed of FPGA affects the performance of the whole platform.There are three key techniques used in FPGA:Multiple level pipeline,DDR3 SDRAM data scheduling and Bus data transmission,which improve the parallelism of the proposed platform.Multiple level pipeline enables parallel operation between individual FPGAs and between individual compute modules in a single FPGA,such that the amount of data processed by the platform per second increases.DDR3 SDRAM data scheduling technology is applied to read data from external DDR3 SDRAM and distribute the data to each computing module working in parallel.Bus data transmission makes FPGA calculation and data transmission work in parallel by using data prefetching method.The proposed platform is then applied to a task of rare sound detection where a deep learning algorithm is used.This experimental not only compares the results obtained by CPU and the proposed platform but also measures the performance of the proposed platform in terms of energy efficiency.Despite almost 30%of energy is contributed by PCIe bus,the overall power consumption of the platform is 1.6× that of a CPU only system and a 15x speedup of computation is gained.This result shows a better energy efficiency of the proposed platform.Based on the results presented in this dissertation,the proposed heterogeneous computing platform demonstrates the capacity of adapting different number of FPGAs,which provides the necessary flexibility of implementing deep learning algorithms in different scales.
Keywords/Search Tags:Deep Learning, Heterogeneous Computation, FPGA, Parallel Computation, Pipeline, Hardware Acceleration
PDF Full Text Request
Related items