Font Size: a A A

Multi-step Unified Approaches With Function Approximation In Reinforcement Learning

Posted on:2020-01-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2428330572496581Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Unifying some existing algorithms and creating some new ones with better performance is an important research area in the field of reinforcement learning.For example,Sutton proposed the TD algorithm in 1998,unifying the monte carlo method(?=1)and the one-step TD learning algorithm(?=0),and has demonstrated that the new algorithm generated with A between 0 and 1 is likely to have better performance.In 2017,Sutton and Barto proposed the Q(?)algorithm.They combined multi-step Sarsa(?=1)and multi-step Expect.ed Sarsa(?=0).And they showed that the new algorithm generated with a between 0 and 1 have better performance than the existing ones.In 2018,Yang et al added eligibility trace to multi-step unified algorithm and proposed the Q(a,A)algorithm,which has reduced the memory usage and computing cost of the Q(,)algorithm.But both Q(?)and Q(?,?)are tabular methods,which cannot deal with reinforcement learning problems in large state space.In reinforcement learning,function approximation is a common way to extend tabular method to large state space problems.This is the main research content of this thesis.In this thesis,the methods of linear approximation and non-linear approximation for the unified algorithm of reinforcement learning are explored.For the linear approximation method,we mainly research the algorithm derived from two loss functions,analyze the convergence condition of the algorithm,prove the convergence of dif-ferent loss functions,and conduct experimental verification.For the nonlinear approximation method,we mainly study the loss function settings under the unified algorithm,describe the algorithm framework settings in detail,and carry out experimental verification.The main research contents and contributions of this thesis are as follows:(1).we extend Q(?,?)algorithm from tabular case to linear function approximation method.Based on MSE loss function,we propose offlne Q(?,?).We derivate the necessary and sufficient conditions of convergence for off-line linear approximation method.And we prove that off-policy Q(?,?)is not stable.(2).In order to solve the instability mentioned above,we propose Gradient Q(?,?)algorithm based on Bellman projection equation(MSPBE)loss function.We prove the convergence in off-policy situation.Gradient Q(?,?)unifies GTD(?)and linear Q?(?).We test the performance of new algorithm generated by Gradient Q(?,?)when ? is between 0 and 1,which gives same or better convergence results ahead 10 to 50 episodes of GTD(?)and linear Q?(?).We test 49 a between 0 and 1 with the step of 0.02,42.2%have better convergence results than the algorithms that are unified by Gradient Q(?,?).(3).We extend tabular methond Q(?)algorithm to non-linear funciton approximation method.We propose DQN(?)algorithm based on deep reinforcement learning.DQN(?)combines Deep Q learning algorithm and Deep Expected Sarsa algorithm.Experiments have been done in three classic control environments and two video game environments.We show that the new algorithm generated from a between 0 and 1 get same or better convergence results but cost 15 t.o 200 episodes less,compared with existing algorithms unified by DQN(?).
Keywords/Search Tags:Reinforcement learning, Unified algorithm, Function approximation
PDF Full Text Request
Related items