Font Size: a A A

Supervised Reinforcement Learning:methods And Applications

Posted on:2022-07-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:L WangFull Text:PDF
GTID:1488306482487014Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Reinforcement learning(RL)learns “what to do”.Its goal is to solve the sequential decision problems by learning a mathematical function to map a series of states of the environment into actions via maximizing the cumulative rewards.In contrast to supervised learning,reinforcement learning is not told which actions to take,but explores to find the action that produces the largest reward.In recent years,reinforcement learning has successively defeated the human champion in a series of games,such as Go,Star Craft,and DOTA.However,due to a set of assumptions that cannot be satisfied in practice,it is hard to be applied in real-world:(a)Real-time in-teraction with the environment.Most reinforcement learning algorithms assume that agents can interact with the environment online and learn from the experience gener-ated by the agent.However,in the real-world,it is costly and risky to generate samples online.(b)No specific reward function.Even rely on expert knowledge,it is difficult for a real system to clearly specify the reward function.(c)High-dimensional action space.Advanced AI systems often need to deal with large-scale action spaces.Over the past decade,supervised learning methods achieve success in many real-world problems.Thus,to solve the proposed challenges,this paper leverage supervision signals from external strategies,expert trajectories and domain knowledge to systemat-ically study the supervised reinforcement learning technology.We further evaluate the proposed methods on three types of social good applications: medical,agricultural,and advertisement.The main contributions of this research are as follows:1.Reinforcement learning with behavior policy supervision.Real-world ap-plications often have historical data generated by external behavior strategies.For challenge(a),this paper utilizes data generated by behavior policy to learn agent policies without interacting with the environment:(1)We proposes super-vised reinforcement learning model which considers the difference between behav-ior policy action and agent action as an augmented reward signal.(2)We further proposes constrained reinforcement learning for sparse behavior policy data which first constrains the agent's policy with the distribution of behavioral policies,then optimizes the policy within the constrained policy space.Finally,in order to per-form off-line evaluation,(3)finnaly an off-policy policy evaluation method with multiple policies is proposed to estimate the policy without interacting with the environment.We theoretically prove that the proposed estimation method has a smaller estimation deviation.2.Reinforcement learning with expert demonstrations.For challenge(b),im-itation learning learns agent policy by directly reproducing the expert's positive policy.In real-world,there are both positive(e.g.,data of surviving patients)and negative samples and negative samples(e.g.,data of deceased patients):(1)This paper combines both positive and negative samples for imitation learning,in which positive samples guide the agent to learn correct actions,and negative samples guide the agent to avoid wrong actions.We proves in theory and experiment that the learned agent policy is close to the positive sample and farther away from the negative sample.(2)In order to solve complex imitation learning problem which is mixed with multiple sub-policies,we proposes a hierarchical imitation learning method,which uses upper confidence bound method to learn a high-level strat-egy.And use imitation learning to learn a series of sub-policies to mimic expert policies.3.Reinforcement learning with domain knowledge supervision.For chal-lenge(c),we transforms the high-dimensional action space RL into a multi-agent RL.We regard each action as an agent,and solves the high-dimensional action space by sharing policies for similar agents.This paper studies two types of multi-agent reinforcement learning methods that use domain knowledge to find sim-ilar agents for sharing policies.(1)We propose a hierarchical multi-agent RL method which decomposes the entire policy into a manager policies and multi-ple sub-policies based on the agent's trajectory and domain knowledge,so that M agents share K policies,where K is much smaller than M.(2)In order to solve the variational number of agents,we further propose an agent-agnostic multi-agent reinforcement learning model,which uses domain knowledge to learn agents' rep-resentation.
Keywords/Search Tags:Reinforcement Learning, Deep Learning, Imitation Learning, Off-line Reinforcement Learning, Multi-agent Reinforcement Learning
PDF Full Text Request
Related items