Font Size: a A A

Research On Multi-goal-conditioned Method In Reinforcement Learning With Sparse Rewards

Posted on:2022-08-13Degree:MasterType:Thesis
Country:ChinaCandidate:Q W HeFull Text:PDF
GTID:2518306323966729Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
Deep Reinforcement Learning(DRL)mainly studies a class of methods in which the agents obtain rewards in interacting with the environment and maximize rewards to find optimal strategies to achieve optimal control.It is considered to be one of the im-portant paths to general artificial intelligence.The problem of sparse rewards is one of the key unsolved challenges in the field of DRL.In the environment with sparse rewards,the agent cannot obtain rewards until the termination of the sequential decision process and the intermediate process lack effective information feedback,which leads to un-derfitting of the strategy network,slow training speed,high cost and other phenomena.This paper researchs on the multi-goal-conditioned DRL method with sparse rewards that the agents can obtain virtual rewards through the guidance of a large number of virtual goals to gradually transition from virtual goals to real goals to overcome the problem of sparse rewards.Compared with the existing methods,this paper optimizes the setting of virtual goals and considers the difference of the spatial value distributions to reduce the coupling error among the goals and improve the exploration efficiency in the initial stage.Firstly,this paper proposes a reward-filtering method based on multi-goal value distribution to reduce the coupling error,by introducing quantile regression and Wasser-stein metric to dynamically eliminate the reward signal interference from the "distant"goals.Such goals have significant different spatial value distributions compared with the current training goal.The method breaks through the assumption of the static spa-tial value distributions used by the existing method and effectively improves the perfor-mance of the multi-goal-conditioned DRL with sparse rewards;Secondly,this paper proposes an exploration-enhancement method based on the maximum entropy model to enhance the exploration efficiency of goals in the initial stage.By introducing the maximum entropy strategy and the temperature adjustment,the interference of successful goals and the randomness of policy are reduced,which improves the stability of the overall training process and the utilization of later rewards.As a result,the method achieves the better task-completion performance than the exist-ing method,fully ahead of the baselines.Finally,this paper explores the distributed optimization scheme for the multi-goal-conditioned DRL with sparse rewards.We find that the asynchronous parallel gradient update can significantly improves CPU utilization,and shortens the training time by more than half under the premise of little performance loss,providing higher throughput rate in multithreading.
Keywords/Search Tags:Reinforcement Learning, Sparse Rewards, Multi-goal-conditioned, Coupling Error, Exploration-exploitation Trade-off
PDF Full Text Request
Related items