Font Size: a A A

Research On Weight Update Method In Temporal Difference Algorithm

Posted on:2021-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:B LiFull Text:PDF
GTID:2428330614959404Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Reinforcement learning can deal with a variety of complex problems in the field of artificial intelligence,and has a wide range of application prospects.Function approximation method can evaluate value function approximately and deal with the problems with large-scale and continues state spaces or action spaces.In function approximation,temporal difference(TD)algorithm can learn online with experience in a model-free environment.This paper focuses on the TD algorithm based on function approximation and researches on the weight update method based on gradient descent method and least-squares methods.Several corresponding weight update methods are proposed.The main research includes the following three parts:i.Least-squares method can improve the convergence speed of TD algorithm.But due to the inaccurate state distribution and unreasonable exploration,TD algorithm can not obtain a satisfactory convergence effect and it is easy to fall into local optimal problem.To solve this problem,a double weighted learning method based on least-squares is proposed.This method combines two weights to find out the target weight,which can not only guarantee the fast convergence speed of TD algorithm,but also improve the exploration ability of the algorithm and get better learning performance.ii.In least-squares methods,the consumption of computing resources is very high and will increase with the expansion of the state scale.Gradient descent method has a slow convergence speed and may causes the algorithm to diverge,but it has a lower computational cost.In the view of this situation,weight gradient descent method is proposed.With the help of the projection operation and gradient descent method,this new method can convert the value function error into the weight error and then update the weights value directly.Weight gradient descent method can be applied to many other TD algorithms based on value function.This new method gives up the advantages of the least-squares method in the convergence speed to reduce the consumption of computing resources.It also has better convergence performance and learning performance than the semi-gradient descent method.iii.Deep reinforcement learning has the perception ability for high-dimensional states and decision-making ability to process transactions.Now reinforcement learning enters in a wider space for development.When using weighted gradient descent method to optimize the weights in a neural network,it is necessary to consider the projection operation in the non-linear function,the solution of the weight errors and the influence of the changing output value of each network layer on the stability of the algorithms.To deal with those problems,hybrid weight gradient descent(HWGD)method is proposed.This method combines the weight gradient descent method and the semi-gradient descent method,which can be applied to various TD algorithms based on value function approximation in deep reinforcement learning and improve the learning performance of the algorithm.
Keywords/Search Tags:Reinforcement Learning, Function Approximation, Temporal Difference, Least-Squares, Semi-Gradient Descent, Deep Reinforcement Learning
PDF Full Text Request
Related items