Efficient Hardware Implement For DRL Algorithm

Posted on:2024-07-02

Degree:Master

Type:Thesis

Country:China

Candidate:X G Shi

Full Text:PDF

GTID:2568307103472984

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

In recent years,with the development of artificial intelligence technology,the research and application of machine learning have attracted the attention of researchers in the field.As one of the important branches of machine learning,reinforcement learning is a forefront topic of machine learning research.Currently,most reinforcement learning algorithms run in the cloud and use high-performance graphics cards such as NVIDIA to accelerate training.However,with the rapid development of edge AI,the demand for hardware acceleration of Deep Reinforcement Learning(DRL)algorithms is also increasing.The main work and innovations are as follows:1.The design scheme of multi-core accelerator is studied.In order to solve the problem that reinforcement learning algorithm contains multiple networks,a flexible and configurable heterogeneous dual-core architecture is proposed.In reasoning,the dual-core adopts parallel pipeline computing to reduce the access to external storage,which can not only accelerate the reasoning process but also reduce the memory access power.During the training,the model parallel computing method is used to realize different neural networks and the different training reinforcement learning network is realized.Meanwhile,the accelerator performs parallel computation at the channel level to improve the accelerator throughput.The final experimental results show that compared with the high-performance CPU platform and GPU platform,the training speed is improved by 5.37 times and 1.05 times,respectively.2.The theory of convolution operator calculation by reinforcement learning is studied.It is found that small convolution kernel computation is needed for forward propagation and large convolution kernel computation for back propagation.In this thesis,a configurable convolution operator is designed to realize the calculation of large convolution kernels through the cooperative operation of several small convolution kernels.Finally,the minimum convolution kernels can be configured as 1×1 and the maximum as 49×49.3.Aiming at the problem of how to efficiently deploy reinforcement learning models on limited hardware resources,the resource and bandwidth analysis model established in this thesis explores the spatial design of reinforcement learning accelerators based on FPGA hardware resources and network models to obtain a better algorithm hardware deployment method..

Keywords/Search Tags:

Reinforcement learning, Hardware acceleration, Channel-level parallelism, Heterogeneous dual-core, Modeling analysis

PDF Full Text Request

Related items

1	Performance Estimation Of Multithreaded System On Heterogeneous Multi-core
2	Research On Hardware Techniques For Thread Level Parallelism
3	Research On Memory-level Parallelism For Multi-core Microprocessor Chip
4	Study Of Heterogeneous Multi-core Acceleration Methods For Convolutional Neural Networks On Reconfigurable Platform
5	Research On FPGA Hardware Acceleration Platform For Reinforcement Learning
6	Design And Research Of Heterogeneous Dual - Core Dual Video Data Processing Platform
7	Research On A Countermeasure Against Power Analysis Attack For Heterogeneous Cryptosystems
8	Parallel Algorithm Design And Optimization For H.264 Video Encoding
9	Design Of Hardware Acceleration IP Core For RC4 Encryption Algorithm
10	A Study Of Data Scheduling Management Strategy In Heterogeneous System Based On Hardware Acceleration