Supervised Reinforcement Learning:methods And Applications

Posted on:2022-07-07

Degree:Doctor

Type:Dissertation

Country:China

Candidate:L Wang

Full Text:PDF

GTID:1488306482487014

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Reinforcement learning(RL)learns �what to do�.Its goal is to solve the sequential decision problems by learning a mathematical function to map a series of states of the environment into actions via maximizing the cumulative rewards.In contrast to supervised learning,reinforcement learning is not told which actions to take,but explores to find the action that produces the largest reward.In recent years,reinforcement learning has successively defeated the human champion in a series of games,such as Go,Star Craft,and DOTA.However,due to a set of assumptions that cannot be satisfied in practice,it is hard to be applied in real-world:(a)Real-time in-teraction with the environment.Most reinforcement learning algorithms assume that agents can interact with the environment online and learn from the experience gener-ated by the agent.However,in the real-world,it is costly and risky to generate samples online.(b)No specific reward function.Even rely on expert knowledge,it is difficult for a real system to clearly specify the reward function.(c)High-dimensional action space.Advanced AI systems often need to deal with large-scale action spaces.Over the past decade,supervised learning methods achieve success in many real-world problems.Thus,to solve the proposed challenges,this paper leverage supervision signals from external strategies,expert trajectories and domain knowledge to systemat-ically study the supervised reinforcement learning technology.We further evaluate the proposed methods on three types of social good applications: medical,agricultural,and advertisement.The main contributions of this research are as follows:1.Reinforcement learning with behavior policy supervision.Real-world ap-plications often have historical data generated by external behavior strategies.For challenge(a),this paper utilizes data generated by behavior policy to learn agent policies without interacting with the environment:(1)We proposes super-vised reinforcement learning model which considers the difference between behav-ior policy action and agent action as an augmented reward signal.(2)We further proposes constrained reinforcement learning for sparse behavior policy data which first constrains the agent's policy with the distribution of behavioral policies,then optimizes the policy within the constrained policy space.Finally,in order to per-form off-line evaluation,(3)finnaly an off-policy policy evaluation method with multiple policies is proposed to estimate the policy without interacting with the environment.We theoretically prove that the proposed estimation method has a smaller estimation deviation.2.Reinforcement learning with expert demonstrations.For challenge(b),im-itation learning learns agent policy by directly reproducing the expert's positive policy.In real-world,there are both positive(e.g.,data of surviving patients)and negative samples and negative samples(e.g.,data of deceased patients):(1)This paper combines both positive and negative samples for imitation learning,in which positive samples guide the agent to learn correct actions,and negative samples guide the agent to avoid wrong actions.We proves in theory and experiment that the learned agent policy is close to the positive sample and farther away from the negative sample.(2)In order to solve complex imitation learning problem which is mixed with multiple sub-policies,we proposes a hierarchical imitation learning method,which uses upper confidence bound method to learn a high-level strat-egy.And use imitation learning to learn a series of sub-policies to mimic expert policies.3.Reinforcement learning with domain knowledge supervision.For chal-lenge(c),we transforms the high-dimensional action space RL into a multi-agent RL.We regard each action as an agent,and solves the high-dimensional action space by sharing policies for similar agents.This paper studies two types of multi-agent reinforcement learning methods that use domain knowledge to find sim-ilar agents for sharing policies.(1)We propose a hierarchical multi-agent RL method which decomposes the entire policy into a manager policies and multi-ple sub-policies based on the agent's trajectory and domain knowledge,so that M agents share K policies,where K is much smaller than M.(2)In order to solve the variational number of agents,we further propose an agent-agnostic multi-agent reinforcement learning model,which uses domain knowledge to learn agents' rep-resentation.

Keywords/Search Tags:

Reinforcement Learning, Deep Learning, Imitation Learning, Off-line Reinforcement Learning, Multi-agent Reinforcement Learning

PDF Full Text Request

Related items

1	Reinforcement Learning Agent Design Based On Deep Perception And Imitation Learning
2	Research On Deep Reinforcement Learning Technology For Multi-agent Collaboration
3	Research On Decision Distribution Modeling In Reinforcement Learning
4	Research On Group Confrontation Strategies Based On Deep Reinforcement Learning
5	Study Of Multi-agent Learning Problem Based On Reinforcement Learning
6	Research On Machine Learning Algorithms Based On Planning Network Model
7	Research On Reinforcement Learning Based Control Method Of Magnetic Navigation AGV
8	Inverse Reinforcement Learning And Imitation Learning With Applications In Intelligent Robotics
9	Research On Reinforcement Learning Method For Game Manipulation Behavior Imitation
10	Collaborative Confrontation Algorithm Based On Deep Reinforcement Learning