Font Size: a A A

Actor-Critic Algorithms With Continuous Action Spaces

Posted on:2018-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:P ZhangFull Text:PDF
GTID:2348330542465214Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Reinforcement learning method targets at learning a mapping from the states to actions by maximizing the numerical rewards,and now it is an important part in machine learning.The optimal control problem is regarded as an important aspect in reinforcement learning,however,the “dimension curse” problem hinders the development of reinforcement learning in continuous spaces.Aiming at the weakness of the traditional methods in traditional reinforcement learning,several improved actor-critic methods are proposed.i.Aiming at the optimal control problem with range constraint in continuous spaces,an action-weighted actor-critic algorithm is proposed.This algorithm serves the actor-critic as the main framework and approximate the optimal value function and optimal policy by linear function,the parameters of the value function and the policy are updated by gradient descent method.The optimal policy is obtained by weight the two groups of policy parameters,and the optimal action is constrained to avoid being out of the range.To further improve the convergence,an improved temporal difference algorithm is designed,namely,the temporal difference error is taken to update the optimal policy and the eligibility is introduced to adjust the policy parameters.ii.In order to improve the convergence rate,an actor-critic algorithm based on the improved natural policy gradient is designed.The algorithm uses the maximal expected return as the goal function.The natural gradient replaces the stochastic gradient and it is used for updating the policy parameters.The natural policy gradient push the process that the optimal policy parameters are obtained.The action weights transforms the solving for optimal action to the double policy parameters.Moreover,the policy eligibility is introduced to optimize the updating function for policy parameters.iii.The traditional reinforcement learning method always finds the optimal policy and value function by maximizing the expected return.This method converges fast and obtains the optimal policy,but it is prone to have the maximization bias,therefore,the policy is easy to fall into the local optimal value.To improve the stability,an actor-critic algorithm based on double value function is proposed.This algorithm uses two groups of value functions to replace the primitive value function for estimating states.There are two groups of estimation functions,one for current value function,the other represents the value function for the next state.The interaction between two groups of estimation functions deduces the possibility of falling into the local optimal for the policy.The experimental result shows that the double weighted actor-critic algorithm has the best convergence performance with high robustness.
Keywords/Search Tags:reinforcement learning, actor-critic, function approximation, continuous spaces
PDF Full Text Request
Related items