Research On Sample-efficient Reinforcement Learning Methods

Posted on:2022-04-20

Degree:Doctor

Type:Dissertation

Country:China

Candidate:J L Li

Full Text:PDF

GTID:1488306746456654

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Reinforcement learning is one of important directions in modern artificial intelligence,and its rich application scenarios make it a significant research area.Reinforcement learning concentrates on how to interact with the environment to learn policies for an agent.During the interaction process,reinforcement learning faces two main challenges:(1)how to efficiently explore,i.e.,how to obtain as much information as possible with as few interactions as possible to obtain the optimal strategy;and(2)how to estimate the delayed reward,i.e.,how to evaluate the impact of subsequent interaction process and assign delayed rewards to current actions.Moreover,various uncertainties in decision scenarios,such as unknown environmental parameters and the influence of other players,make these challenges more difficult.This thesis proposes sample-efficient reinforcement learning algorithms to solve the optimal policy in the face of uncertainty.The main contributions of this thesis are listed below:1.For the exploration problem under the reward uncertainty,this thesis considers the contextual bandits and proposes a Bayesian method for the best subset identification problem.This method makes use of the contextual information to improve the efficiency of exploration,making itself a sample-efficient algorithm.2.For the exploration problem under unknown transition parameters and unknown transition perturbations,this thesis considers the robust Markov decision process and gives an algorithm for solving the robust policy.This algorithm can be proved to converge to a near-optimal robust policy with a polynomial number of samples,when the perturbation satisfies some smooth assumptions.3.For the exploration problem in multi-agent reinforcement learning,this thesis considers the finite-horizon turn-based stochastic game and proposes a sample efficient algorithm for solving the Nash Equilibrium of the game.4.For the reward-delay problem in multi-agent reinforcement learning,this thesis considers the two-player zero-sum extensive game with perfect-information and gives an algorithm,which makes use of the information of a strong opponent.This algorithm can improve the agent's policy efficiently through interacting with the strong opponent and ensure the robustness of the learned policy against the uncertainty of the opponent.

Keywords/Search Tags:

reinforcement learning, contextual bandits, Markov decision process, multiagent reinforcement learning

PDF Full Text Request

Related items

1	Intelligent Interference Strategy Generation Based On Reinforcement Learning
2	Research On Multiagent Cooperation And Applications Based On Reinforcement Learning
3	Continuous Time Hierarchical Reinforcement Learning Algorithm
4	Research On Reinforcement Learning Based Communication Jamming Strategy Learning Methods
5	Inverse Reinforcement Learning Algorithms In Semi-markov Environment
6	Research And Implementation Of Penetration Testing System Based On Reinforcement Learning
7	Research On Slice Resources Management And Orchestration Algorithms Based On Reinforcement Learning
8	Research On Network Intrusion Detection Model Based On Reinforcement Learning
9	Object Focused Reinforcement Learning Algorithm Research
10	Research On Intelligent Exploration Algorithm Of Reinforcement Learning