Font Size: a A A

Research On Sample-efficient Reinforcement Learning Methods

Posted on:2022-04-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:J L LiFull Text:PDF
GTID:1488306746456654Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Reinforcement learning is one of important directions in modern artificial intelligence,and its rich application scenarios make it a significant research area.Reinforcement learning concentrates on how to interact with the environment to learn policies for an agent.During the interaction process,reinforcement learning faces two main challenges:(1)how to efficiently explore,i.e.,how to obtain as much information as possible with as few interactions as possible to obtain the optimal strategy;and(2)how to estimate the delayed reward,i.e.,how to evaluate the impact of subsequent interaction process and assign delayed rewards to current actions.Moreover,various uncertainties in decision scenarios,such as unknown environmental parameters and the influence of other players,make these challenges more difficult.This thesis proposes sample-efficient reinforcement learning algorithms to solve the optimal policy in the face of uncertainty.The main contributions of this thesis are listed below:1.For the exploration problem under the reward uncertainty,this thesis considers the contextual bandits and proposes a Bayesian method for the best subset identification problem.This method makes use of the contextual information to improve the efficiency of exploration,making itself a sample-efficient algorithm.2.For the exploration problem under unknown transition parameters and unknown transition perturbations,this thesis considers the robust Markov decision process and gives an algorithm for solving the robust policy.This algorithm can be proved to converge to a near-optimal robust policy with a polynomial number of samples,when the perturbation satisfies some smooth assumptions.3.For the exploration problem in multi-agent reinforcement learning,this thesis considers the finite-horizon turn-based stochastic game and proposes a sample efficient algorithm for solving the Nash Equilibrium of the game.4.For the reward-delay problem in multi-agent reinforcement learning,this thesis considers the two-player zero-sum extensive game with perfect-information and gives an algorithm,which makes use of the information of a strong opponent.This algorithm can improve the agent's policy efficiently through interacting with the strong opponent and ensure the robustness of the learned policy against the uncertainty of the opponent.
Keywords/Search Tags:reinforcement learning, contextual bandits, Markov decision process, multiagent reinforcement learning
PDF Full Text Request
Related items