Research On Weight Update Method In Temporal Difference Algorithm

Posted on:2021-03-13

Degree:Master

Type:Thesis

Country:China

Candidate:B Li

Full Text:PDF

GTID:2428330614959404

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Reinforcement learning can deal with a variety of complex problems in the field of artificial intelligence,and has a wide range of application prospects.Function approximation method can evaluate value function approximately and deal with the problems with large-scale and continues state spaces or action spaces.In function approximation,temporal difference(TD)algorithm can learn online with experience in a model-free environment.This paper focuses on the TD algorithm based on function approximation and researches on the weight update method based on gradient descent method and least-squares methods.Several corresponding weight update methods are proposed.The main research includes the following three parts:i.Least-squares method can improve the convergence speed of TD algorithm.But due to the inaccurate state distribution and unreasonable exploration,TD algorithm can not obtain a satisfactory convergence effect and it is easy to fall into local optimal problem.To solve this problem,a double weighted learning method based on least-squares is proposed.This method combines two weights to find out the target weight,which can not only guarantee the fast convergence speed of TD algorithm,but also improve the exploration ability of the algorithm and get better learning performance.ii.In least-squares methods,the consumption of computing resources is very high and will increase with the expansion of the state scale.Gradient descent method has a slow convergence speed and may causes the algorithm to diverge,but it has a lower computational cost.In the view of this situation,weight gradient descent method is proposed.With the help of the projection operation and gradient descent method,this new method can convert the value function error into the weight error and then update the weights value directly.Weight gradient descent method can be applied to many other TD algorithms based on value function.This new method gives up the advantages of the least-squares method in the convergence speed to reduce the consumption of computing resources.It also has better convergence performance and learning performance than the semi-gradient descent method.iii.Deep reinforcement learning has the perception ability for high-dimensional states and decision-making ability to process transactions.Now reinforcement learning enters in a wider space for development.When using weighted gradient descent method to optimize the weights in a neural network,it is necessary to consider the projection operation in the non-linear function,the solution of the weight errors and the influence of the changing output value of each network layer on the stability of the algorithms.To deal with those problems,hybrid weight gradient descent(HWGD)method is proposed.This method combines the weight gradient descent method and the semi-gradient descent method,which can be applied to various TD algorithms based on value function approximation in deep reinforcement learning and improve the learning performance of the algorithm.

Keywords/Search Tags:

Reinforcement Learning, Function Approximation, Temporal Difference, Least-Squares, Semi-Gradient Descent, Deep Reinforcement Learning

PDF Full Text Request

Related items

1	Research On Regularized Least Squares Policy Evaluation Algorithms In Reinforcement Learning
2	Research On Multiagent Cooperation And Applications Based On Reinforcement Learning
3	Research On Temporal Difference Algorithm Based On Kernel Function Approximation
4	Research On Online Reinforcement Learning Based On Sparse Representation
5	Application Of Radial Basic Function Networks And Instance Based Learning In Reinforcement Learning
6	Reinforcement Learning-based Optimal Control Methods With Applications To Mobile Robots
7	Study Of Reinforcement Learning Algorithms Based On Value Function Approximation
8	Research On Non-parametric Function Approximation Methods In Continuous Spaces
9	Theoretical Research On Multi-step Reinforcement Learning Algorithm
10	Reinforcement Learning And Its Applications In MAS-based Collaborative Conceptural Design