Deep Reinforcement Learning Based On Policy Gradient Optimization And Its Application In Agent Control

Posted on:2022-12-27

Degree:Master

Type:Thesis

Country:China

Candidate:J Chen

Full Text:PDF

GTID:2518306776452554

Subject:Automation Technology

Abstract/Summary:

PDF Full Text Request

In recent years,the research on deep reinforcement learning algorithms based on agent systems has become a hot topic in the academia,and researchers have also achieved many excellent research results.Among them,how to make the agent system learn the optimal policy stably and efficiently during the training process is an important research topic in the field of policy gradient optimization.First of all,from the algorithm point of view,the existing algorithms mainly focus on the research of deterministic strategies,deterministic strategies are evaluation deviations and limitations in the processing of noise in the reward value,resulting in less algorithm stability.Secondly,from the application point of view,in the multi-agent system,there is collaboration and gaming between agents,which make the environment complex and changeable.If the single-agent algorithm is directly applied to the multi-agent system,the existing policy gradient optimization algorithm has certain limitations on the information processing capability of the system,resulting in less environmental exploration.Aiming at the above problems,this paper analyzes the limitations of the existing algorithms,improves the algorithms,and applies the improved algorithms to complex agent systems.The main work of the paper is as follows:Firstly,this thesis proposes the Deep Stochastic Policy Gradient(DSPG)algorithm for the problem of agent policy optimization.The DSPG algorithm mainly addresses the limitations of the Deep Deterministic Policy Gradient(DDPG)algorithm,improves the deterministic policy gradient in the DDPG algorithm to a stochastic-policy-gradient,and uses two parallel value networks to improve the value network in the algorithm model to enhance the accuracy of action value evaluation,while using importance sampling to achieve the parameter update of the target policy network,thus optimizing the training process of the policy network.It is demonstrated through simulation experiments that the DSPG algorithm has stronger robustness while having more exploration ability to the environment.Secondly,this thesis analyzes the process of multi-agent interaction with the environment,and analyzes the two challenges faced by the expansion of the algorithm from a single-agent system to a multi-agent system,and proposed corresponding solutions.The first challenge is that an agent needs to consider other agents in a multi-agent system.State and decision-making,this thesis solves the structural problems of the policy network and value network in the multiagent system by drawing on the centralized training and distributed execution in the Multiagent Deep Deterministic Policy Gradient(MADDPG)algorithm,so that the multi-agent system can be trained more efficiently.with policy information.The second challenge is the difference in decision-making goals.Multi-agent systems no longer pursue the maximization of individual rewards,but the maximization of global rewards.This thesis proves that the stochastic policy gradient can be solved in the policy network evaluated by the multi-agent joint value function,and thus proves that the improved Multi-agent Deep Stochastic Policy Gradien(MADSPG)algorithm can solve the global reward maximum of the multi-agent by the gradient descent method.Finally,three multi-agent environment scenarios are designed by Gym platform,namely,hunting-escaping environment,animal world environment and action world environment with added communication mechanism.Finally,it is experimentally demonstrated that the improved MADSPG algorithm can not only be successfully applied to the multi-agent environment,but also the algorithm stability,exploration ability and computing efficiency are better than the traditional MADDPG algorithm.

Keywords/Search Tags:

Multi-agent systems, Deep Reinforcement Learning, MADDPG, Stochastic policy, policy gradient optimization

PDF Full Text Request

Related items

1	Research On Agent Decision-making And Control Based On Deep Reinforcement Learning
2	Research On Multiagent Cooperation And Applications Based On Reinforcement Learning
3	Research On Multiagent Policy Optimization Based On Deep Reinforcement Learning
4	Research On Fast Policy Gradient Algorithms Of Reinforcement Learning Based On Adaptive Learning Rate
5	Optimization On Deep Reinforcement Learning Based On Policy Gradient
6	Research On Multi-Agent Pursuit-Evasion Based On Deep Reinforcement Learning
7	Theories, Algortihms And Applications Of Policy Gradient Reinforcement Learning
8	Exploration Strategy Of Deterministic Policy In Deep Reinforcement Learning
9	Research And Implementation On Game Control Algorithm Based On Deepening Reinforcement Learning
10	Gait Analysis Of Quadruped Robot Based On Deep Reinforcement Learning