Research On Reninforcement Learning Network Algorithm With Self-adaptive Basis Function

Posted on:2019-06-13

Degree:Master

Type:Thesis

Country:China

Candidate:Y T Wang

Full Text:PDF

GTID:2428330551958020

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

Reinforcement learning is one of the most important part in machine learning,in which agent learns how to act by interacting with environment.Reinforcement learning is a process of seeking the optimal policy.Policy evaluation is a main method in reinforcement learning.It is an assessment of given policy,which is based on value function.In general,value function is estimated by linear parameterized function approximation.Previous algorithm improves their accuracy by tuning the weights of network regardless of basis function in network.In fact,basis function in value function approximation has a significant impact on ability of algorithm.Centers of basis function can be determined by problems to be solved.However,it is difficult to set the widths in basis function.Scholars usually do many experiments to test the proper value of widths.In this research,the architecture of neural network is utilized to realize reinforcement learning algorithm.In the proposed algorithm,the widths in basis function can be tuned automatically.The main content in this paper is as follows:1.Adaptive RC-network is proposed in this research.In the proposed algorithm,parameters of basis function are tuned automatically until to the optimal in value function approximation,especially for the widths of basis function.TD error and value function are estimated by value function approximation and RC reinforcement learning.Meanwhile the TD error is back propagated to update parameters in basis function,that is,widths and weights.In this way,the proposed algorithm can achieve the optimal performance during learning adaptively.Experimental results and theoretical analysis proved that this algorithm has better performance in both policy evaluation and policy iteration.For policy evaluation,proposed algorithm can approximate the value function more accurately and steady.For policy iteration,comparing with traditional methods,it takes less steps to achieve the goal through more steady process in learning control problems.2?Adaptive iLSTDC-network is proposed in this research,which confirms practicable of the network architecture.In this algorithm,value function and TD error are approximated by a network.TD error is then back propagated to update the widths of basis function and network weights.In this way,the network is optimized gradually by tuning parameters.Finally,the optimal network is employed to policy evaluation and policy iteration.Experiments results showed that this algorithm has a better performance.In the meantime,computation time in each step dramatically reduced.

Keywords/Search Tags:

reinforcement learning, policy evaluation, policy iteration, value function, function approximation, basis function

PDF Full Text Request

Related items

1	Analysis And Research On Off-policy Algorithms In Reinforcement Learning
2	Research On Multiagent Cooperation And Applications Based On Reinforcement Learning
3	Research On Nonparametric Value Function Approximation Reinforcement Learning
4	Research On Regularized Least Squares Policy Evaluation Algorithms In Reinforcement Learning
5	Research On Regularized Policy Gradient
6	Research On Reinforcement Learning Based On Value Function Approximation And State Space Decomposition
7	Policy Iteration Reinforcement Learning Based On Geodesic Gaussian Kernel
8	Researches On Reinforcement Learning Algorithm Based On Nonparametric Approximation
9	Research On Non-parametric Function Approximation Methods In Continuous Spaces
10	Research On Basis Function Construction Methods In Reinforcement Learning