Improving The Generalization Of Reinforcement Learning In Continuous Control Via State Instability Regularizer

Posted on:2021-01-27

Degree:Master

Type:Thesis

Country:China

Candidate:C X Zhao

Full Text:PDF

GTID:2428330620468139

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Reinforcement learning has been proved to be a powerful tool for various sequential decision making problems,and increasingly become popular among the machine learning community.However,recent research results show that the models trained using reinforcement learning algorithms have poor generalization ability.When stochasticity is incorporated in training environments,the models that are able to perform well on training environments often show a significant performance degradation on testing environmentsIn this paper,we address the generalization issue of reinforcement learning in the context of continuous control.We argue that the policy iteration based on the unstable states has contributed to the lack of generalization.The reinforcement learning models can easily overfit to the states that have high value estimations but are sensitive to environmental shift or stochasticity.Therefore,we propose a novel regularization strategy,termed as State Instability Regularizer(SIR),to augment the generalization by reducing the value estimation of the unstable states.For each state,the SIR can be defined as the negative KL divergence between the optimal policy with respect to the value function,and the optimal policy w.r.t.the adversarial value function.In order to facilitate the computing of the regularization function,we derive a lower bound of the KL divergence.The lower bound can be generally combined with many model-free reinforcement learning algorithms to enhance the generalization.In addition,the theoretical convergence using the lower bound is provedWe further propose SIR-TD3 algorithm as a regularized version of TD3 for better generalization ability.Our experimental study on 6 different continuous control benchmarks confirms that SIR-TD3 can not only reduce the model variance of performance in training environments,but also outperform existing baselines in the testing environments with environmental perturbations.

Keywords/Search Tags:

reinforcement learning, adversarial learning, generalization, robustness, continuous control

PDF Full Text Request

Related items

1	Research On Deep Reinforcement Learning For Robustness And Security
2	Reaearch On Deep Reinforcement Learning Algorithm In Continuous Action On Space
3	Research On Continuous Robot Control Algorithms Based On Reinforcement Learning
4	Research On The Adversarial Attack And Its Countermeasure Of Deep Reinforcement Learning
5	Research And Implementation Of Agent Continuous Control Technology Based On Distributed Reinforcement Learning
6	Research On Reinforcement Learning In Continuous Spaces
7	Reinforcement Learning And Its Applications In Navigation And Control Of Mobile Robots
8	Adversarial Robustness Of Distance-based Machine Learning Models
9	On Reinforcement Learning Control For Bionic Underwater Robots
10	Research On Deep Reinforcement Learning Based Text Adversarial Attack Method