Font Size: a A A

Improving The Generalization Of Reinforcement Learning In Continuous Control Via State Instability Regularizer

Posted on:2021-01-27Degree:MasterType:Thesis
Country:ChinaCandidate:C X ZhaoFull Text:PDF
GTID:2428330620468139Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Reinforcement learning has been proved to be a powerful tool for various sequential decision making problems,and increasingly become popular among the machine learning community.However,recent research results show that the models trained using reinforcement learning algorithms have poor generalization ability.When stochasticity is incorporated in training environments,the models that are able to perform well on training environments often show a significant performance degradation on testing environmentsIn this paper,we address the generalization issue of reinforcement learning in the context of continuous control.We argue that the policy iteration based on the unstable states has contributed to the lack of generalization.The reinforcement learning models can easily overfit to the states that have high value estimations but are sensitive to environmental shift or stochasticity.Therefore,we propose a novel regularization strategy,termed as State Instability Regularizer(SIR),to augment the generalization by reducing the value estimation of the unstable states.For each state,the SIR can be defined as the negative KL divergence between the optimal policy with respect to the value function,and the optimal policy w.r.t.the adversarial value function.In order to facilitate the computing of the regularization function,we derive a lower bound of the KL divergence.The lower bound can be generally combined with many model-free reinforcement learning algorithms to enhance the generalization.In addition,the theoretical convergence using the lower bound is provedWe further propose SIR-TD3 algorithm as a regularized version of TD3 for better generalization ability.Our experimental study on 6 different continuous control benchmarks confirms that SIR-TD3 can not only reduce the model variance of performance in training environments,but also outperform existing baselines in the testing environments with environmental perturbations.
Keywords/Search Tags:reinforcement learning, adversarial learning, generalization, robustness, continuous control
PDF Full Text Request
Related items