On The Theory Of Single-task And Multitask Reward-free Reinforcement Learning Under Low-rank MDPs

Posted on:2024-03-02

Degree:Master

Type:Thesis

Country:China

Candidate:Y Cheng

Full Text:PDF

GTID:2530306932455754

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

In modern machine learning including reinforcement learning(RL),as representation learning becomes a powerful technique to reduce sample complexity,theoretical understanding of’HOW’and’WHY’is still limited:how to learn a good representation in RL and why does representation learning benefit RL?This paper first answers the question of "HOW" by investigating representation learning in reward-free RL under low-rank Markov decision process(MDP)models,in which an agent explores the environment first without any reward information,in order to achieve certain learning goals afterwards for any given reward.Although some representation learning algorithms have been proposed for reward-free low-rank MDPs,the corresponding sample complexity is still far from being satisfactory.In this work,the first known sample complexity lower bound that holds for any algorithm under low-rank MDPs is provided.This lower bound implies it is strictly harder to find a near-optimal policy under low-rank MDPs than under linear MDPs.This paper then proposes a novel model-based algorithm,coined RAFFLE,and shows it can both find an ε-optimal policy and achieves an ε-accurate system identification via reward-free exploration,with a sample complexity significantly improving the previous results.Such a sample complexity matches our lower bound in the dependence on e,as well as on K in the large d regime,where d and K respectively denote the representation dimension and action space cardinality.In addition,this paper provides a planning algorithm(without further interaction with true environment)for RAFFLE to learn a near-accurate representation,which is the first known representation learning guarantee under the same setting.Then,to answer the question of ’WHY’,this paper first studies multitask low-rank RL(as upstream training),where all tasks share a common representation,and proposes a new multitask reward-free algorithm called REFUEL.REFUEL learns both the transition kernel and the near-optimal policy for each task,and outputs an estimated representation for downstream tasks.Our result demonstrates that multitask representation learning is provably more sample-efficient than learning each task individually,as long as the total number of tasks is above a certain threshold.This paper then studies the downstream RL in both online and offline settings,where the agent is assigned with a new task sharing the same representation as the upstream tasks.For both online and offline settings,a sample-efficient algorithm is developed,which is able to find a near-optimal policy with the suboptimality gap bounded by the sum of the estimation error of the learned representation in upstream and a vanishing term as the number of downstream samples becomes large.Our downstream results of online and offline RL further capture the benefit of employing the learned representation from upstream as opposed to learning the representation of the low-rank model directly.To the best of our knowledge,this is the first theoretical study that characterizes the benefit of representation learning in exploration-based reward-free multitask RL for both upstream and downstream tasks.The two research topics of this paper explore the theoretical properties of representation learning in reinforcement learning by answering the two questions of "how"and "why",and propose corresponding algorithms.This research helps to guide the algorithm design in practical applications,provides new analytical tools for future theoretical research,and expands the statistical theory frontier of reinforcement learning.

Keywords/Search Tags:

statistical learning theory, reinforcement learning theory, representation learning, reward-free learning, multitask learning, transfer learning, statistical complexity, sample complexity

PDF Full Text Request

Related items

1	The Quantile Regression Learning And Related Problems
2	Kernel-Based Algorithms In Statistical Learning Theory
3	Kernels Methods For Fast Forecasting
4	A Multi-level Feature Augmented Deep Graph Representation Learning Model
5	Feature Hashing And Multi-task Learning Based Network Representation Learning
6	Analysis Of Learner’s Behavior And Prediction Of Learning Risk Based On Statistical Learning Method
7	Molecular Generation Optimization Based On Deep Learning And Its Application
8	Study On Statistical Learning Theory And Their Applications In Geoscience
9	Research On Graph Representation Learning Method Based On Contrastive Learning
10	Investigation Of Some Problems On Regularization Algorithms In Learning Theory