Font Size: a A A

Research On Deep Reinforcement Learning Algorithm Based On Dual-Agent Cooperation

Posted on:2024-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:S ZhuFull Text:PDF
GTID:2568307064985839Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The deep reinforcement learning algorithms based on actor-critic,such as DDPG and TD3,are usually used to solve the continuous control tasks of single agent in the working mode of single agent.Multi-agent reinforcement learning algorithm is to deal with multi-agent tasks in the mode of multi-agent cooperation.This paper expects that the idea of multi-agent cooperation can also be used to solve single agent tasks.The core problem is how to achieve effective cooperation among multi-agent in the single agent tasks.This paper found from experience that some of the previous ideas,such as agent collaboration and environment interaction,sharing replay buffer,and using all critics to calculate the average or minimum target Q value,could not make agents cooperate effectively.In this paper,double agents are taken as the research object to study how to effectively collaborate on single agent tasks.The main work of this paper is as follows:1.In order to apply the idea of double agent cooperation to the single agent task,this paper constructs a double agents structure in the single-agent task.In order to explore whether the two agents cooperate effectively,this paper designs a Move Car experimental environment.The experimental results show that some previous collaborative ideas have limited performance improvement and even lead to the instability of policy learning.It also shows that the collaborative method proposed in this paper can help the policy get rid of the local optimal strategy,learn a smoother value function,and make the policy learning process more stable.2.Because there is only one agent interacting with the environment in the single agent task,there are two agents in the double agents structure.Therefore,this paper proposes a joint actor that selects the behavior of two agents based on different value function criteria.It is used to interact with the environment to solve the interaction problem of two agents on single agent tasks.3.For the collaborative learning between critics in the double agents structure,this paper first proposes the Maxmin Critic method to reduce the co-value estimation bias of critics,but it is found that this will make the upper bound of the error between the value function and the optimal value function larger.For this reason,this paper proposes the Soft Maxmin Critic method to alleviate the above problems.This paper theoretically proves that this method can control the increase of the upper bound of the error while reducing the value estimation error.4.For the cooperative learning between actors in the double agents structure,this paper proposes a Joint Actor Imitation mechanism.Based on the value estimation function of the agent,the mechanism designs a joint actor with a larger value estimation to guide the actor’s learning.This paper theoretically proves that this mechanism can not only bring higher cumulative expected return,but also ensure monotonous increase of cumulative expected return in the process of policy learning.5.In this paper,the double agents structure and cooperation pattern are applied to TD3 and DDPG respectively,and the generalization of the double agents structure and cooperation pattern is proved.The performance comparison with the classical reinforcement learning algorithm and the latest improved value estimation deviation algorithm proves the effectiveness of this work.In addition,the ablation experiment,parameter experiment,contrast experiment of different operators,and bias analysis experiment were carried out to comprehensively analyze the work of this paper.
Keywords/Search Tags:reinforcement learning, deep reinforcement learning, actor-critic, continuous control task, value estimation bias
PDF Full Text Request
Related items