Font Size: a A A

Hardware Implementation And Application Of Reinforcement Learning Algorithm For Online Decision

Posted on:2021-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:X P LiFull Text:PDF
GTID:2428330611498116Subject:Instrumentation engineering
Abstract/Summary:PDF Full Text Request
Online decision-making is a way for intelligent entities to make autonomous decisions without human intervention.It has broad application prospects in military and civil fields such as drone maneuvering decision-making,robot control,and car unmanned driving.Compared with traditional decision-making algorithms such as expert systems,decision-making algorithms based on deep reinforcement learning have online learning capabilities and can achieve end-to-end perception and decision-making,so they have received more attention in applications.However,because deep reinforcement learning is computationally intensive,GPUs are often used for algorithm training when applied,which makes it difficult to apply to end-side systems with limited computing resources and low power requirements.To this end,this topic is oriented to online decision-making applications,carrying out hardware implementation and application research of FPGA-based reinforcement learning algorithms.The main research work is as follows:1.Focusing on the design requirements,the overall research plan was determined through the analysis of the typical deep reinforcement learning algorithm Deep Q-Network(DQN)algorithm structure and FPGA computing resource evaluation.The main contents include: proposed an algorithm hardware implementation architecture based on the idea of software and hardware collaborative computing,completed the decomposition of the algorithm acceleration task,determined the design method of the algorithm hardware accelerator using the flow computing structure,and clarified the algorithm application verification method.2.Based on the algorithm hardware implementation archi tecture and acceleration task decomposition scheme,the design of the DQN algorithm hardware accelerator is completed.The DQN algorithm hardware accelerator is the core research content in hardware implementation.For the DQN algorithm hardware acceleration process,there are both network inference and training computing characteristics,while considering the data dependence and access bandwidth in parallel computing,etc.,from inside to outside Design idea,completed the specific design of the accelerator operator unit,calculation module and control module in the accelerator,and packaged and simulated the whole to facilitate the implementation of hardware for different decision-making applications.3.Aiming at the designed DQN algorithm hardware accele rator,the design space of parallel computing parameters is explored.Combining the characteristics of FPGA resources and the neural network structure of the DQN algorithm,the resources and calculation time consumed by the accelerator are modeled and analyzed to explore the best parallel computing parameters for the application of the accelerator.Afterwards,the accelerator was integrated into the system in the form of an IP core,and the scheduling design of the hardware implementation of the DQN algorithm was completed.4.The two applications of inverted pendulum control decision and UAV ground attack maneuver decision were verified.The verification work mainly includes four parts: application analysis,application environment modeling,accelerator parameter exploration and optimization,and performance analysis.The test results show that the design is correct and meets the design requirements in terms of decision time and design power consumption.At the same time,the training time and power consumption are compared with the CPU platform and the GPU platform.The test results show that the FPGA has certain advantages in training time and power consumption.
Keywords/Search Tags:Online decision, Reinforcement learning, Hardware Acceleration, Inverted pendulum control decision
PDF Full Text Request
Related items