Research On Uncertainty-weighted Offline Reinforcement Learning

Posted on:2024-01-26

Degree:Master

Type:Thesis

Country:China

Candidate:B H Xie

Full Text:PDF

GTID:2568306932956039

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

With the development of machine learning,artificial intelligence has been applied to a wide range of scenarios.Significant progress has been made in reinforcement learning for video,board and card games.However,how to apply reinforcement learning efficiently in real-world settings where the exploration costs are relatively high,such as healthcare,autonomous vehicles and robotics,is still a challenge in the development of reinforcement learning.Offline reinforcement learning avoids exploration costs by learning policies effectively from previously collected datasets.But lacking of interaction with the environment continually brings the distribution shift between the learned policy and the behavior policy.The training process exacerbates the extrapolation errors from out-ofdistribution(OOD)actions or states,which leads to training failure.To tackle this issue,most existing approaches are typically divided into two categories according to whether using OOD actions during policy evaluation:reinforcement learning-based(RL-based)methods and imitation learning-based(IL-based)methods.The former restrict the distance between the target policy and the behavioral policy by value function regularization or policy constraints.These approaches require a trade-off between the accuracy of value estimation and policy improvement and may lead to over-conservatism.The latter apply imitation learning on the dataset rather than querying the values of unseen actions to avoid extrapolation error.But at the same time,avoiding OOD samples easily leads to limited performance improvement and loses the possibility to generalize beyond the dataset.To address the above problems,this thesis proposed two offline reinforcement leearning methods from the perspective of uncertainty in deep learning.To address the problem that value estimation is too conservative in RL-based offline reinforcement learning methods,this thesis proposed Double Actors and Uncertainty-Weighted Critics for Offline Reinforcement Learning,DAUWC.During policy evaluation,a moderately optimistic state-value function is learned by double actors,while uncertainty estimation is introduced through an ensemble of state-action value networks.During policy extraction,the learned policy is implicitly constrained to the behavioral policy by weighting the advantage value to enhance the stability of training.To address the problem of limited performance improvement in IL-based offline reinforcement learning methods,this thesis proposed Uncertainty-Weighted Implicit QLearning,UWIQL.The offline data is fully utilized during policy evaluation,avoiding using OOD samples.Uncertainty estimation and expectile regression are used to estimate the state value function to effectively improve the generalization performance of the value function.During policy extraction,advantage-weighted behavioral cloning and uncertainty optimization are used to maximize the advantage value function,thereby improving the exploitation ability.Experimental results on D4RL,a standard benchmark for offline RL,show that the proposed methods significantly outperform state-of-the-art methods with higher normalized scores in most of the tasks,paving the way for practical industrial applications of reinforcement learning.In future work,we will continue our research on offline reinforcement learning and apply our methods on more industrial scenarios.

Keywords/Search Tags:

Machine Learning, Reinforcement Learning, Offline Reinforcement Learning, Imitation Learning

PDF Full Text Request

Related items

1	Research And Implementation Of Deep Reinforcement Learning Algorithm Based On Offline And Online Mixed Strategies
2	Supervised Reinforcement Learning:methods And Applications
3	Research On Decision Distribution Modeling In Reinforcement Learning
4	Inverse Reinforcement Learning And Imitation Learning With Applications In Intelligent Robotics
5	Research On Machine Learning Algorithms Based On Planning Network Model
6	Research On Reinforcement Learning Method For Game Manipulation Behavior Imitation
7	Reinforcement Learning Agent Design Based On Deep Perception And Imitation Learning
8	Research On Offline Deep Reinforcement Learning Algorithm Based On Truncation Error
9	Optimization For Generative Modeling And Its Applications In Imitation Learning
10	Offline Reinforcement Learning Algorithms And Their Applications On Industrial Control