| In modern machine learning including reinforcement learning(RL),as representation learning becomes a powerful technique to reduce sample complexity,theoretical understanding of’HOW’and’WHY’is still limited:how to learn a good representation in RL and why does representation learning benefit RL?This paper first answers the question of "HOW" by investigating representation learning in reward-free RL under low-rank Markov decision process(MDP)models,in which an agent explores the environment first without any reward information,in order to achieve certain learning goals afterwards for any given reward.Although some representation learning algorithms have been proposed for reward-free low-rank MDPs,the corresponding sample complexity is still far from being satisfactory.In this work,the first known sample complexity lower bound that holds for any algorithm under low-rank MDPs is provided.This lower bound implies it is strictly harder to find a near-optimal policy under low-rank MDPs than under linear MDPs.This paper then proposes a novel model-based algorithm,coined RAFFLE,and shows it can both find an ε-optimal policy and achieves an ε-accurate system identification via reward-free exploration,with a sample complexity significantly improving the previous results.Such a sample complexity matches our lower bound in the dependence on e,as well as on K in the large d regime,where d and K respectively denote the representation dimension and action space cardinality.In addition,this paper provides a planning algorithm(without further interaction with true environment)for RAFFLE to learn a near-accurate representation,which is the first known representation learning guarantee under the same setting.Then,to answer the question of ’WHY’,this paper first studies multitask low-rank RL(as upstream training),where all tasks share a common representation,and proposes a new multitask reward-free algorithm called REFUEL.REFUEL learns both the transition kernel and the near-optimal policy for each task,and outputs an estimated representation for downstream tasks.Our result demonstrates that multitask representation learning is provably more sample-efficient than learning each task individually,as long as the total number of tasks is above a certain threshold.This paper then studies the downstream RL in both online and offline settings,where the agent is assigned with a new task sharing the same representation as the upstream tasks.For both online and offline settings,a sample-efficient algorithm is developed,which is able to find a near-optimal policy with the suboptimality gap bounded by the sum of the estimation error of the learned representation in upstream and a vanishing term as the number of downstream samples becomes large.Our downstream results of online and offline RL further capture the benefit of employing the learned representation from upstream as opposed to learning the representation of the low-rank model directly.To the best of our knowledge,this is the first theoretical study that characterizes the benefit of representation learning in exploration-based reward-free multitask RL for both upstream and downstream tasks.The two research topics of this paper explore the theoretical properties of representation learning in reinforcement learning by answering the two questions of "how"and "why",and propose corresponding algorithms.This research helps to guide the algorithm design in practical applications,provides new analytical tools for future theoretical research,and expands the statistical theory frontier of reinforcement learning. |