| In recent years,with the development of artificial intelligence technology,the research and application of machine learning have attracted the attention of researchers in the field.As one of the important branches of machine learning,reinforcement learning is a forefront topic of machine learning research.Currently,most reinforcement learning algorithms run in the cloud and use high-performance graphics cards such as NVIDIA to accelerate training.However,with the rapid development of edge AI,the demand for hardware acceleration of Deep Reinforcement Learning(DRL)algorithms is also increasing.The main work and innovations are as follows:1.The design scheme of multi-core accelerator is studied.In order to solve the problem that reinforcement learning algorithm contains multiple networks,a flexible and configurable heterogeneous dual-core architecture is proposed.In reasoning,the dual-core adopts parallel pipeline computing to reduce the access to external storage,which can not only accelerate the reasoning process but also reduce the memory access power.During the training,the model parallel computing method is used to realize different neural networks and the different training reinforcement learning network is realized.Meanwhile,the accelerator performs parallel computation at the channel level to improve the accelerator throughput.The final experimental results show that compared with the high-performance CPU platform and GPU platform,the training speed is improved by 5.37 times and 1.05 times,respectively.2.The theory of convolution operator calculation by reinforcement learning is studied.It is found that small convolution kernel computation is needed for forward propagation and large convolution kernel computation for back propagation.In this thesis,a configurable convolution operator is designed to realize the calculation of large convolution kernels through the cooperative operation of several small convolution kernels.Finally,the minimum convolution kernels can be configured as 1×1 and the maximum as 49×49.3.Aiming at the problem of how to efficiently deploy reinforcement learning models on limited hardware resources,the resource and bandwidth analysis model established in this thesis explores the spatial design of reinforcement learning accelerators based on FPGA hardware resources and network models to obtain a better algorithm hardware deployment method.. |